Model parameters: d_model 768 ffw_size 3072 kv_size 64 n_heads 12 n_layers 15 Megatron-DeepSpeed/pretrain_gpt.py --tensor-model-parallel-size 1 --pipeline-model-parallel-size 1 --num-layers 15 --hidden-size 768 --num-attention-heads 12 --kv-channels 64 --ffn-hidden-size 3072 --seq-length 2048 --max-position-embeddings 2048 --micro-batch-size 4 --global-batch-size 256 --train-samples 29_492_188 --vocab-file gpt2/vocab.json --merge-file gpt2/merges.txt --loss-scale 12 --clip-grad 1.0 --kill-switch-path kill-switch-146m60b100m --bf16 --checkpoint-activations --optimizer adam --adam-beta1 0.9 --adam-beta2 0.999 --adam-eps 1e-8 --lr 2e-4 --min-lr 2e-5 --lr-decay-style cosine --lr-decay-samples 29_492_188 --lr-warmup-samples 294_922 --clip-grad 1.0 --weight-decay 1e-1 --log-interval 100 --save-interval 10000 --eval-interval 10000 --eval-iters 1 --tensorboard-dir tensorboard_146m60b100m --tensorboard-queue-size 5 --log-timers-to-tensorboard --log-batch-size-to-tensorboard --log-validation-ppl-to-tensorboard --save checkpoints_146m60b100m --load checkpoints_146m60b100m --train-weighted-split-paths-path train100m.txt --valid-weighted-split-paths-path val.txt --data-impl mmap --deepspeed --deepspeed_config ds_configs/3324364.json --zero-stage 0 START 3324364: Thu 16 Mar 2023 06:53:17 PM EET 0: 0: 0: ======================= ROCm System Management Interface ======================= 0: ================================= Concise Info ================================= 0: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% 0: 0 46.0c 91.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 0: 1 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 0: 2 44.0c 95.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 0: 3 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 0: 4 48.0c 95.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 0: 5 40.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 0: 6 45.0c 93.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 0: 7 42.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 0: ================================================================================ 0: ============================= End of ROCm SMI Log ============================== 6: 6: 6: ======================= ROCm System Management Interface ======================= 6: ================================= Concise Info ================================= 6: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% 6: 0 49.0c 92.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 6: 1 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 6: 2 45.0c 85.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 6: 3 42.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 6: 4 41.0c 83.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 6: 5 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 6: 6 39.0c 89.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 6: 7 41.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 6: ================================================================================ 6: ============================= End of ROCm SMI Log ============================== 2: 2: 2: ======================= ROCm System Management Interface ======================= 2: ================================= Concise Info ================================= 2: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% 2: 0 44.0c 93.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 2: 1 49.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 2: 2 43.0c 84.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 2: 3 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 2: 4 46.0c 93.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 2: 5 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 2: 6 43.0c 90.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 2: 7 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 2: ================================================================================ 2: ============================= End of ROCm SMI Log ============================== 5: 5: 5: ======================= ROCm System Management Interface ======================= 5: ================================= Concise Info ================================= 5: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% 5: 0 45.0c 94.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 5: 1 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 5: 2 42.0c 84.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 5: 3 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 5: 4 50.0c 85.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 5: 5 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 5: 6 46.0c 85.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 5: 7 48.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 5: ================================================================================ 5: ============================= End of ROCm SMI Log ============================== 3: 3: 3: ======================= ROCm System Management Interface ======================= 3: ================================= Concise Info ================================= 3: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% 3: 0 46.0c 91.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 3: 1 48.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 3: 2 38.0c 81.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 3: 3 49.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 3: 4 41.0c 85.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 3: 5 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 3: 6 42.0c 83.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 3: 7 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 3: ================================================================================ 3: ============================= End of ROCm SMI Log ============================== 7: 7: 7: ======================= ROCm System Management Interface ======================= 7: ================================= Concise Info ================================= 7: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% 7: 0 42.0c 88.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 7: 1 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 7: 2 40.0c 87.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 7: 3 49.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 7: 4 41.0c 91.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 7: 5 39.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 7: 6 37.0c 87.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 7: 7 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 7: ================================================================================ 7: ============================= End of ROCm SMI Log ============================== 4: 4: 4: ======================= ROCm System Management Interface ======================= 4: ================================= Concise Info ================================= 4: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% 4: 0 45.0c 90.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 4: 1 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 4: 2 40.0c 89.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 4: 3 52.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 4: 4 45.0c 86.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 4: 5 42.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 4: 6 43.0c 92.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 4: 7 42.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 4: ================================================================================ 4: ============================= End of ROCm SMI Log ============================== 1: 1: 1: ======================= ROCm System Management Interface ======================= 1: ================================= Concise Info ================================= 1: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% 1: 0 44.0c 92.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 1: 1 50.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 1: 2 43.0c 87.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 1: 3 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 1: 4 39.0c 88.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 1: 5 48.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 1: 6 40.0c 89.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 1: 7 40.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 1: ================================================================================ 1: ============================= End of ROCm SMI Log ============================== 7: Launching on nid006724 (7/8), master nid006717 port 9999, GPUs 8, CUDA: True 6: Launching on nid006723 (6/8), master nid006717 port 9999, GPUs 8, CUDA: True 3: Launching on nid006720 (3/8), master nid006717 port 9999, GPUs 8, CUDA: True 5: Launching on nid006722 (5/8), master nid006717 port 9999, GPUs 8, CUDA: True 0: Launching on nid006717 (0/8), master nid006717 port 9999, GPUs 8, CUDA: True 4: Launching on nid006721 (4/8), master nid006717 port 9999, GPUs 8, CUDA: True 2: Launching on nid006719 (2/8), master nid006717 port 9999, GPUs 8, CUDA: True 1: Launching on nid006718 (1/8), master nid006717 port 9999, GPUs 8, CUDA: True 0: using world size: 64, data-parallel-size: 64, tensor-model-parallel size: 1, pipeline-model-parallel size: 1 0: accumulate and all-reduce gradients in fp32 for bfloat16 data type. 0: using torch.bfloat16 for parameters ... 0: ------------------------ arguments ------------------------ 0: abort_on_unmet_fused_kernel_constraints ......... False 0: accumulate_allreduce_grads_in_fp32 .............. True 0: adam_beta1 ...................................... 0.9 0: adam_beta2 ...................................... 0.999 0: adam_eps ........................................ 1e-08 0: adlr_autoresume ................................. False 0: adlr_autoresume_interval ........................ 1000 0: apply_query_key_layer_scaling ................... True 0: apply_residual_connection_post_layernorm ........ False 0: attention_dropout ............................... 0.1 0: attention_softmax_in_fp32 ....................... False 0: bert_binary_head ................................ True 0: bert_load ....................................... None 0: bf16 ............................................ True 0: bias_dropout_fusion ............................. True 0: bias_gelu_fusion ................................ True 0: biencoder_projection_dim ........................ 0 0: biencoder_shared_query_context_model ............ False 0: block_data_path ................................. None 0: checkpoint_activations .......................... True 0: checkpoint_in_cpu ............................... False 0: checkpoint_num_layers ........................... 1 0: clip_grad ....................................... 1.0 0: codecarbon_dir .................................. None 0: consumed_train_samples .......................... 0 0: consumed_train_tokens ........................... 0 0: consumed_valid_samples .......................... 0 0: contigious_checkpointing ........................ False 0: cpu_optimizer ................................... False 0: cpu_torch_adam .................................. False 0: curriculum_learning ............................. False 0: data_impl ....................................... mmap 0: data_parallel_size .............................. 64 0: data_path ....................................... None 0: dataloader_type ................................. single 0: DDP_impl ........................................ local 0: decoder_seq_length .............................. None 0: deepscale ....................................... False 0: deepscale_config ................................ None 0: deepspeed ....................................... True 0: deepspeed_activation_checkpointing .............. False 0: deepspeed_config ................................ ds_configs/3324364.json 0: deepspeed_mpi ................................... False 0: distribute_checkpointed_activations ............. False 0: distributed_backend ............................. nccl 0: embed_layernorm ................................. False 0: embedding_path .................................. None 0: encoder_seq_length .............................. 2048 0: eod_mask_loss ................................... False 0: eval_interval ................................... 10000 0: eval_iters ...................................... 1 0: eval_only ....................................... None 0: evidence_data_path .............................. None 0: exit_duration_in_mins ........................... None 0: exit_interval ................................... None 0: ffn_hidden_size ................................. 3072 0: finetune ........................................ False 0: fp16 ............................................ False 0: fp16_lm_cross_entropy ........................... False 0: fp32_residual_connection ........................ False 0: gigaflos_no_embeds .............................. 0 0: global_batch_size ............................... 256 0: glu_activation .................................. None 0: hidden_dropout .................................. 0.1 0: hidden_size ..................................... 768 0: hysteresis ...................................... 2 0: ict_head_size ................................... None 0: ict_load ........................................ None 0: img_dim ......................................... 224 0: indexer_batch_size .............................. 128 0: indexer_log_interval ............................ 1000 0: inference ....................................... False 0: init_method_std ................................. 0.02 0: init_method_xavier_uniform ...................... False 0: initial_loss_scale .............................. 4294967296 0: kill_switch_path ................................ kill-switch-146m60b100m 0: kv_channels ..................................... 64 0: layer_norm_fusion ............................... True 0: layernorm_epsilon ............................... 1e-05 0: lazy_mpu_init ................................... None 0: load ............................................ checkpoints_146m60b100m 0: local_rank ...................................... None 0: log_batch_size_to_tensorboard ................... True 0: log_interval .................................... 100 0: log_learning_rate_to_tensorboard ................ True 0: log_level ....................................... None 0: log_level_replica ............................... None 0: log_loss_scale_to_tensorboard ................... True 0: log_num_zeros_in_grad ........................... False 0: log_params_norm ................................. False 0: log_path ........................................ None 0: log_timers_to_tensorboard ....................... True 0: log_validation_ppl_to_tensorboard ............... True 0: loss_on_targets_only ............................ False 0: loss_scale ...................................... 12.0 0: loss_scale_window ............................... 1000 0: lr .............................................. 0.0002 0: lr_decay_iters .................................. None 0: lr_decay_samples ................................ 29492188 0: lr_decay_style .................................. cosine 0: lr_decay_tokens ................................. None 0: lr_warmup_fraction .............................. None 0: lr_warmup_iters ................................. 0 0: lr_warmup_samples ............................... 294922 0: make_vocab_size_divisible_by .................... 128 0: mask_prob ....................................... 0.15 0: masked_softmax_fusion ........................... True 0: max_position_embeddings ......................... 2048 0: mean_noise_span_length .......................... None 0: memory_centric_tiled_linear ..................... False 0: merge_file ...................................... gpt2/merges.txt 0: micro_batch_size ................................ 4 0: min_loss_scale .................................. 1.0 0: min_lr .......................................... 2e-05 0: mmap_warmup ..................................... False 0: no_load_optim ................................... None 0: no_load_rng ..................................... None 0: no_save_optim ................................... None 0: no_save_rng ..................................... None 0: noise_density ................................... None 0: num_attention_heads ............................. 12 0: num_channels .................................... 3 0: num_classes ..................................... 1000 0: num_layers ...................................... 15 0: num_layers_per_virtual_pipeline_stage ........... None 0: num_workers ..................................... 2 0: onnx_safe ....................................... None 0: openai_gelu ..................................... False 0: optimizer ....................................... adam 0: optimizer_fusion ................................ True 0: override_lr_scheduler ........................... False 0: pad_vocab_size_to ............................... None 0: params_dtype .................................... torch.bfloat16 0: partition_activations ........................... False 0: patch_dim ....................................... 16 0: pipeline_model_parallel_size .................... 1 0: position_embedding_type ......................... PositionEmbeddingType.absolute 0: pp_partition_method ............................. None 0: profile_backward ................................ False 0: query_in_block_prob ............................. 0.1 0: rampup_batch_size ............................... None 0: rank ............................................ 0 0: remote_device ................................... none 0: reset_attention_mask ............................ False 0: reset_position_ids .............................. False 0: reset_progress .................................. None 0: retriever_report_topk_accuracies ................ [] 0: retriever_score_scaling ......................... False 0: retriever_seq_length ............................ 256 0: reweight_loss_based_on_position_frequency ....... False 0: sample_rate ..................................... 1.0 0: save ............................................ checkpoints_146m60b100m 0: save_interval ................................... 10000 0: scatter_gather_tensors_in_pipeline .............. True 0: scattered_embeddings ............................ False 0: seed ............................................ 1234 0: seq_length ...................................... 2048 0: sgd_momentum .................................... 0.9 0: short_seq_prob .................................. 0.1 0: skip_train_iteration_range ...................... None 0: split ........................................... None 0: split_transformers .............................. False 0: sync_tp_duplicated_parameters ................... False 0: synchronize_each_layer .......................... False 0: tensor_model_parallel_size ...................... 1 0: tensorboard_dir ................................. tensorboard_146m60b100m 0: tensorboard_log_interval ........................ 1 0: tensorboard_queue_size .......................... 5 0: test_weighted_split_paths ....................... None 0: test_weighted_split_paths_path .................. None 0: tile_factor ..................................... 1 0: titles_data_path ................................ None 0: tokenizer_name_or_path .......................... None 0: tokenizer_type .................................. GPT2BPETokenizer 0: train_iters ..................................... None 0: train_samples ................................... 29492188 0: train_tokens .................................... None 0: train_weighted_split_names ...................... ['train'] 0: train_weighted_split_paths ...................... [['/scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_100M_text_document']] 0: train_weighted_split_paths_path ................. None 0: train_weighted_split_splits ..................... [['0:1']] 0: train_weighted_split_weights .................... [['1.0']] 0: universal_checkpoint ............................ False 0: use_bnb_optimizer ............................... False 0: use_checkpoint_lr_scheduler ..................... False 0: use_contiguous_buffers_in_ddp ................... True 0: use_cpu_initialization .......................... None 0: use_one_sent_docs ............................... False 0: use_pin_memory .................................. False 0: valid_num_workers ............................... 2 0: valid_weighted_split_names ...................... ['validation'] 0: valid_weighted_split_paths ...................... [['/scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document']] 0: valid_weighted_split_paths_path ................. None 0: valid_weighted_split_splits ..................... [['0:1']] 0: valid_weighted_split_weights .................... [['1.0']] 0: virtual_pipeline_model_parallel_size ............ None 0: vocab_extra_ids ................................. 0 0: vocab_file ...................................... gpt2/vocab.json 0: weight_decay .................................... 0.1 0: world_size ...................................... 64 0: zero_allgather_bucket_size ...................... 0.0 0: zero_contigious_gradients ....................... False 0: zero_reduce_bucket_size ......................... 0.0 0: zero_reduce_scatter ............................. False 0: zero_stage ...................................... 0 0: -------------------- end of arguments --------------------- 0: setting number of micro-batches to constant 1 0: > building GPT2BPETokenizer tokenizer ... 0: > padded vocab (size: 50257) with 47 dummy tokens (new size: 50304) 0: DeepSpeed general environment info: 0: torch install path ............... ['/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch'] 0: torch version .................... 1.13.0+rocm5.2 0: torch cuda version ............... None 0: torch hip version ................ 5.2.21151-afdc89f8 0: nvcc version ..................... None 0: deepspeed install path ........... ['/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed'] 0: deepspeed info ................... 0.7.5, unknown, unknown 0: deepspeed wheel compiled w. ...... torch 1.13, hip 5.1 7: > setting tensorboard ... 0: **** Git info for Megatron: git_hash=unknown git_branch=unknown **** 0: > initializing torch distributed ... 0: [2023-03-16 18:54:11,052] [INFO] [comm.py:633:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl 0: > initializing tensor model parallel with size 1 0: > initializing pipeline model parallel with size 1 0: > setting random seeds to 1234 ... 0: > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 3952 and data parallel seed: 1234 0: > compiling dataset index builder ... 0: make: Entering directory '/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/data' 0: make: Nothing to be done for 'default'. 0: make: Leaving directory '/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/data' 0: >>> done with dataset index builder. Compilation time: 0.114 seconds 0: > compiling and loading fused kernels ... 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax.cpp -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.cpp [skipped, already hipified] 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.h [skipped, already hipified] 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h [skipped, no changes] 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h [skipped, no changes] 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_cuda.cu -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.hip [skipped, already hipified] 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h [skipped, no changes] 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h [skipped, no changes] 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.h [skipped, already hipified] 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.h [skipped, already hipified] 0: Total number of unsupported CUDA function calls: 0 0: 0: 0: Total number of replaced kernel launches: 87 0: [1/1] c++ scaled_upper_triang_masked_softmax_hip.o scaled_upper_triang_masked_softmax_hip.cuda.o -shared -L/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/lib -lc10 -lc10_hip -ltorch_cpu -ltorch_hip -ltorch -ltorch_python -L/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib -lamdhip64 -o scaled_upper_triang_masked_softmax_cuda.so 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax.cpp -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.cpp [skipped, already hipified] 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_cuda.cu -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.hip [skipped, already hipified] 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h [skipped, no changes] 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h [skipped, no changes] 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.h [skipped, already hipified] 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.h [skipped, already hipified] 0: Total number of unsupported CUDA function calls: 0 0: 0: 0: Total number of replaced kernel launches: 63 0: [1/1] c++ scaled_masked_softmax_hip.cuda.o scaled_masked_softmax_hip.o -shared -L/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/lib -lc10 -lc10_hip -ltorch_cpu -ltorch_hip -ltorch -ltorch_python -L/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib -lamdhip64 -o scaled_masked_softmax_cuda.so 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_cuda.cpp -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_cuda.cpp [skipped, no changes] 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_cuda_kernel.cu -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_hip_kernel.hip [skipped, already hipified] 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h [skipped, no changes] 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h [skipped, no changes] 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.h [skipped, already hipified] 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.h [skipped, already hipified] 0: Total number of unsupported CUDA function calls: 0 0: 0: 0: Total number of replaced kernel launches: 67 0: [1/1] c++ layer_norm_hip_kernel.cuda.o layer_norm_cuda.o -shared -L/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/lib -lc10 -lc10_hip -ltorch_cpu -ltorch_hip -ltorch -ltorch_python -L/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib -lamdhip64 -o fused_mix_prec_layer_norm_cuda.so 0: >>> done with compiling and loading fused kernels. Compilation time: 26.023 seconds 0: time to initialize megatron (seconds): 79.925 0: [after megatron is initialized] datetime: 2023-03-16 18:54:39 0: building GPT model ... 0: [2023-03-16 18:54:40,061] [INFO] [utils.py:827:see_memory_usage] Before Building Model 0: [2023-03-16 18:54:40,062] [INFO] [utils.py:828:see_memory_usage] MA 0.0 GB Max_MA 0.0 GB CA 0.0 GB Max_CA 0 GB 0: [2023-03-16 18:54:40,062] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.41 GB, percent = 6.2% 0: SEED_LAYERS=False BASE_SEED=1234 SEED_FN=None 0: Using topology: {ProcessCoord(pipe=0, data=0, model=0): 0, ProcessCoord(pipe=0, data=1, model=0): 1, ProcessCoord(pipe=0, data=2, model=0): 2, ProcessCoord(pipe=0, data=3, model=0): 3, ProcessCoord(pipe=0, data=4, model=0): 4, ProcessCoord(pipe=0, data=5, model=0): 5, ProcessCoord(pipe=0, data=6, model=0): 6, ProcessCoord(pipe=0, data=7, model=0): 7, ProcessCoord(pipe=0, data=8, model=0): 8, ProcessCoord(pipe=0, data=9, model=0): 9, ProcessCoord(pipe=0, data=10, model=0): 10, ProcessCoord(pipe=0, data=11, model=0): 11, ProcessCoord(pipe=0, data=12, model=0): 12, ProcessCoord(pipe=0, data=13, model=0): 13, ProcessCoord(pipe=0, data=14, model=0): 14, ProcessCoord(pipe=0, data=15, model=0): 15, ProcessCoord(pipe=0, data=16, model=0): 16, ProcessCoord(pipe=0, data=17, model=0): 17, ProcessCoord(pipe=0, data=18, model=0): 18, ProcessCoord(pipe=0, data=19, model=0): 19, ProcessCoord(pipe=0, data=20, model=0): 20, ProcessCoord(pipe=0, data=21, model=0): 21, ProcessCoord(pipe=0, data=22, model=0): 22, ProcessCoord(pi 0: pe=0, data=23, model=0): 23, ProcessCoord(pipe=0, data=24, model=0): 24, ProcessCoord(pipe=0, data=25, model=0): 25, ProcessCoord(pipe=0, data=26, model=0): 26, ProcessCoord(pipe=0, data=27, model=0): 27, ProcessCoord(pipe=0, data=28, model=0): 28, ProcessCoord(pipe=0, data=29, model=0): 29, ProcessCoord(pipe=0, data=30, model=0): 30, ProcessCoord(pipe=0, data=31, model=0): 31, ProcessCoord(pipe=0, data=32, model=0): 32, ProcessCoord(pipe=0, data=33, model=0): 33, ProcessCoord(pipe=0, data=34, model=0): 34, ProcessCoord(pipe=0, data=35, model=0): 35, ProcessCoord(pipe=0, data=36, model=0): 36, ProcessCoord(pipe=0, data=37, model=0): 37, ProcessCoord(pipe=0, data=38, model=0): 38, ProcessCoord(pipe=0, data=39, model=0): 39, ProcessCoord(pipe=0, data=40, model=0): 40, ProcessCoord(pipe=0, data=41, model=0): 41, ProcessCoord(pipe=0, data=42, model=0): 42, ProcessCoord(pipe=0, data=43, model=0): 43, ProcessCoord(pipe=0, data=44, model=0): 44, ProcessCoord(pipe=0, data=45, model=0): 45, ProcessCoord(pipe=0, data=4 0: 6, model=0): 46, ProcessCoord(pipe=0, data=47, model=0): 47, ProcessCoord(pipe=0, data=48, model=0): 48, ProcessCoord(pipe=0, data=49, model=0): 49, ProcessCoord(pipe=0, data=50, model=0): 50, ProcessCoord(pipe=0, data=51, model=0): 51, ProcessCoord(pipe=0, data=52, model=0): 52, ProcessCoord(pipe=0, data=53, model=0): 53, ProcessCoord(pipe=0, data=54, model=0): 54, ProcessCoord(pipe=0, data=55, model=0): 55, ProcessCoord(pipe=0, data=56, model=0): 56, ProcessCoord(pipe=0, data=57, model=0): 57, ProcessCoord(pipe=0, data=58, model=0): 58, ProcessCoord(pipe=0, data=59, model=0): 59, ProcessCoord(pipe=0, data=60, model=0): 60, ProcessCoord(pipe=0, data=61, model=0): 61, ProcessCoord(pipe=0, data=62, model=0): 62, ProcessCoord(pipe=0, data=63, model=0): 63} 0: [2023-03-16 18:54:42,061] [INFO] [module.py:366:_partition_layers] Partitioning pipeline stages with method type:transformer 0: stage=0 layers=22 0: 0: _to_float16 0: 1: EmbeddingPipe 0: 2: 0: 3: ParallelTransformerLayerPipe 0: 4: ParallelTransformerLayerPipe 0: 5: ParallelTransformerLayerPipe 0: 6: ParallelTransformerLayerPipe 0: 7: ParallelTransformerLayerPipe 0: 8: ParallelTransformerLayerPipe 0: 9: ParallelTransformerLayerPipe 0: 10: ParallelTransformerLayerPipe 0: 11: ParallelTransformerLayerPipe 0: 12: ParallelTransformerLayerPipe 0: 13: ParallelTransformerLayerPipe 0: 14: ParallelTransformerLayerPipe 0: 15: ParallelTransformerLayerPipe 0: 16: ParallelTransformerLayerPipe 0: 17: ParallelTransformerLayerPipe 0: 18: undo 0: 19: MixedFusedLayerNorm 0: 20: EmbeddingPipe 0: 21: float16_to_fp32 0: loss: CrossEntropy 0: [2023-03-16 18:54:42,479] [INFO] [utils.py:827:see_memory_usage] After Building Model 0: [2023-03-16 18:54:42,480] [INFO] [utils.py:828:see_memory_usage] MA 0.28 GB Max_MA 0.28 GB CA 0.29 GB Max_CA 0 GB 0: [2023-03-16 18:54:42,480] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.43 GB, percent = 6.2% 0: setting training iterations to 115203 0: > learning rate decay style: cosine 0: DeepSpeed is enabled. 0: [2023-03-16 18:54:42,482] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed info: version=0.7.5, git-hash=unknown, git-branch=unknown 0: [2023-03-16 18:54:55,818] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False 0: [2023-03-16 18:54:55,819] [INFO] [logging.py:68:log_dist] [Rank 0] Removing param_group that has no 'params' in the client Optimizer 0: [2023-03-16 18:54:55,819] [INFO] [logging.py:68:log_dist] [Rank 0] Using client Optimizer as basic optimizer 0: [2023-03-16 18:54:55,824] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Basic Optimizer = FusedAdam 0: [2023-03-16 18:54:55,824] [INFO] [logging.py:68:log_dist] [Rank 0] Creating BF16 optimizer 0: [2023-03-16 18:54:55,945] [INFO] [utils.py:827:see_memory_usage] begin bf16_optimizer 0: [2023-03-16 18:54:55,946] [INFO] [utils.py:828:see_memory_usage] MA 0.28 GB Max_MA 0.29 GB CA 0.31 GB Max_CA 0 GB 0: [2023-03-16 18:54:55,946] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 32.11 GB, percent = 6.4% 3: ninja: no work to do. 0: Time to load utils op: 0.20101118087768555 seconds 0: Time to load utils op: 0.29367613792419434 secondsTime to load utils op: 0.2927970886230469 seconds 0: 0: Time to load utils op: 0.2942619323730469 seconds 0: Time to load utils op: 0.2941570281982422 seconds 0: Time to load utils op: 0.29441165924072266 seconds 0: Time to load utils op: 0.2934088706970215 seconds 0: Time to load utils op: 0.29189491271972656 seconds 3: Time to load utils op: 0.2928340435028076 secondsTime to load utils op: 0.29289913177490234 seconds 3: 3: Time to load utils op: 0.2928318977355957 seconds 3: Time to load utils op: 0.2933230400085449 secondsTime to load utils op: 0.18764233589172363 seconds 3: 3: Time to load utils op: 0.1876358985900879 secondsTime to load utils op: 0.18844890594482422 seconds 3: 3: Time to load utils op: 0.3230123519897461 seconds 1: Time to load utils op: 0.29088401794433594 seconds 1: Time to load utils op: 0.2908895015716553 seconds 1: Time to load utils op: 0.2908928394317627 seconds 1: Time to load utils op: 0.29090404510498047 seconds 1: Time to load utils op: 0.2909224033355713 seconds 1: Time to load utils op: 0.2909281253814697 seconds 1: Time to load utils op: 0.29091596603393555 seconds 1: Time to load utils op: 0.29093360900878906 seconds 2: Time to load utils op: 0.29103803634643555 seconds 2: Time to load utils op: 0.29105520248413086 seconds 2: Time to load utils op: 0.29108452796936035 seconds 2: Time to load utils op: 0.2911107540130615 secondsTime to load utils op: 0.2911109924316406 seconds 2: 2: Time to load utils op: 0.2911250591278076 seconds 2: Time to load utils op: 0.29113221168518066 seconds 2: Time to load utils op: 0.29114794731140137 seconds 7: Time to load utils op: 0.18056249618530273 seconds 7: Time to load utils op: 0.1806015968322754 secondsTime to load utils op: 0.18061041831970215 seconds 7: 7: Time to load utils op: 0.18060874938964844 seconds 7: Time to load utils op: 0.1806187629699707 seconds 7: Time to load utils op: 0.18062472343444824 secondsTime to load utils op: 0.1806340217590332 seconds 7: 7: Time to load utils op: 0.180633544921875 seconds 5: Time to load utils op: 0.317767858505249 seconds 5: Time to load utils op: 0.1876835823059082 seconds 5: Time to load utils op: 0.18785500526428223 seconds 5: Time to load utils op: 0.187880277633667 secondsTime to load utils op: 0.1877584457397461 seconds 5: 5: Time to load utils op: 0.18788862228393555 seconds 5: Time to load utils op: 0.18788981437683105 secondsTime to load utils op: 0.18757843971252441 seconds 5: 6: Time to load utils op: 0.1822972297668457 seconds 6: Time to load utils op: 0.18227243423461914 seconds 6: Time to load utils op: 0.18230938911437988 seconds 6: Time to load utils op: 0.1823110580444336 seconds 6: Time to load utils op: 0.18233346939086914 secondsTime to load utils op: 0.1823139190673828 seconds 6: 6: Time to load utils op: 0.18232393264770508 seconds 6: Time to load utils op: 0.18235135078430176 seconds 4: Time to load utils op: 0.18973565101623535 seconds 4: Time to load utils op: 0.18991422653198242 seconds 4: Time to load utils op: 0.1894090175628662 seconds 4: Time to load utils op: 0.18950295448303223 seconds 4: Time to load utils op: 0.18945026397705078 secondsTime to load utils op: 0.18884658813476562 seconds 4: 4: Time to load utils op: 0.1891953945159912 seconds 4: Time to load utils op: 0.18888401985168457 seconds 0: [2023-03-16 18:54:56,259] [INFO] [utils.py:827:see_memory_usage] before initializing group 0 0: [2023-03-16 18:54:56,259] [INFO] [utils.py:828:see_memory_usage] MA 0.28 GB Max_MA 0.28 GB CA 0.31 GB Max_CA 0 GB 0: [2023-03-16 18:54:56,260] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 32.11 GB, percent = 6.4% 4: Time to load utils op: 0.0009534358978271484 seconds 4: Time to load utils op: 0.0010166168212890625 seconds 4: Time to load utils op: 0.001215219497680664 secondsTime to load utils op: 0.0012791156768798828 seconds 4: 4: Time to load utils op: 0.00121307373046875 seconds 4: Time to load utils op: 0.0011749267578125 seconds 4: Time to load utils op: 0.0012373924255371094 seconds 4: Time to load utils op: 0.0012860298156738281 seconds 5: Time to load utils op: 0.000492095947265625 seconds 5: Time to load utils op: 0.0005328655242919922 seconds 5: Time to load utils op: 0.0005242824554443359 seconds 5: Time to load utils op: 0.00042724609375 secondsTime to load utils op: 0.0004391670227050781 seconds 5: 5: Time to load utils op: 0.00041604042053222656 seconds 5: Time to load utils op: 0.0004105567932128906 seconds 2: Time to load utils op: 0.0008008480072021484 seconds 5: Time to load utils op: 0.0004754066467285156 seconds 2: Time to load utils op: 0.0010023117065429688 seconds 2: Time to load utils op: 0.0009734630584716797 secondsTime to load utils op: 0.0009527206420898438 seconds 2: 2: Time to load utils op: 0.0009911060333251953 seconds 2: Time to load utils op: 0.0009524822235107422 seconds 2: Time to load utils op: 0.0009717941284179688 seconds 2: Time to load utils op: 0.001073598861694336 seconds 0: Time to load utils op: 0.0005013942718505859 secondsTime to load utils op: 0.0005216598510742188 seconds 0: 0: Time to load utils op: 0.0004355907440185547 secondsTime to load utils op: 0.00047135353088378906 seconds 0: 0: Time to load utils op: 0.0005238056182861328 secondsTime to load utils op: 0.0005595684051513672 seconds 0: 0: Time to load utils op: 0.0004334449768066406 seconds 3: Time to load utils op: 0.0004906654357910156 seconds 3: Time to load utils op: 0.0005214214324951172 seconds 3: Time to load utils op: 0.0005364418029785156 seconds 3: Time to load utils op: 0.00047326087951660156 secondsTime to load utils op: 0.0004525184631347656 seconds 3: 3: Time to load utils op: 0.0005183219909667969 seconds 3: Time to load utils op: 0.0005860328674316406 seconds 3: Time to load utils op: 0.0005230903625488281 seconds 1: Time to load utils op: 0.0008940696716308594 seconds 6: Time to load utils op: 0.0012235641479492188 seconds 6: Time to load utils op: 0.0011212825775146484 seconds 1: Time to load utils op: 0.0010037422180175781 seconds 7: Time to load utils op: 0.0010151863098144531 seconds 6: Time to load utils op: 0.0014319419860839844 seconds 6: Time to load utils op: 0.0013480186462402344 seconds 6: Time to load utils op: 0.0013785362243652344 secondsTime to load utils op: 0.0014214515686035156 seconds 6: 1: Time to load utils op: 0.0012362003326416016 seconds 6: Time to load utils op: 0.001416921615600586 seconds 7: Time to load utils op: 0.0011506080627441406 secondsTime to load utils op: 0.001157522201538086 seconds 7: 6: Time to load utils op: 0.001483917236328125 seconds 7: Time to load utils op: 0.0013775825500488281 seconds 1: Time to load utils op: 0.0013580322265625 secondsTime to load utils op: 0.0013201236724853516 seconds 1: 1: Time to load utils op: 0.001363515853881836 secondsTime to load utils op: 0.001268625259399414 seconds 1: 1: Time to load utils op: 0.0014057159423828125 seconds 7: Time to load utils op: 0.0014672279357910156 seconds 7: Time to load utils op: 0.0014650821685791016 seconds 7: Time to load utils op: 0.0014681816101074219 seconds 7: Time to load utils op: 0.0014803409576416016 seconds 0: [2023-03-16 18:54:56,392] [INFO] [utils.py:827:see_memory_usage] after initializing group 0 0: [2023-03-16 18:54:56,393] [INFO] [utils.py:828:see_memory_usage] MA 0.62 GB Max_MA 0.62 GB CA 0.82 GB Max_CA 1 GB 0: [2023-03-16 18:54:56,393] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 32.26 GB, percent = 6.4% 0: [2023-03-16 18:54:56,502] [INFO] [utils.py:827:see_memory_usage] before initializing group 1 0: [2023-03-16 18:54:56,503] [INFO] [utils.py:828:see_memory_usage] MA 0.62 GB Max_MA 0.62 GB CA 0.82 GB Max_CA 1 GB 0: [2023-03-16 18:54:56,503] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 32.26 GB, percent = 6.4% 0: [2023-03-16 18:54:56,612] [INFO] [utils.py:827:see_memory_usage] after initializing group 1 0: [2023-03-16 18:54:56,612] [INFO] [utils.py:828:see_memory_usage] MA 0.83 GB Max_MA 0.83 GB CA 1.13 GB Max_CA 1 GB 0: [2023-03-16 18:54:56,612] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 32.26 GB, percent = 6.4% 0: [2023-03-16 18:54:56,717] [INFO] [utils.py:827:see_memory_usage] before initializing group 2 0: [2023-03-16 18:54:56,717] [INFO] [utils.py:828:see_memory_usage] MA 0.83 GB Max_MA 0.83 GB CA 1.13 GB Max_CA 1 GB 0: [2023-03-16 18:54:56,718] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 32.26 GB, percent = 6.4% 0: [2023-03-16 18:54:56,825] [INFO] [utils.py:827:see_memory_usage] after initializing group 2 0: [2023-03-16 18:54:56,825] [INFO] [utils.py:828:see_memory_usage] MA 0.83 GB Max_MA 0.83 GB CA 1.13 GB Max_CA 1 GB 0: [2023-03-16 18:54:56,825] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 32.26 GB, percent = 6.4% 0: [2023-03-16 18:54:56,929] [INFO] [utils.py:827:see_memory_usage] before initialize_optimizer 0: [2023-03-16 18:54:56,930] [INFO] [utils.py:828:see_memory_usage] MA 0.83 GB Max_MA 0.83 GB CA 1.13 GB Max_CA 1 GB 0: [2023-03-16 18:54:56,930] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 32.26 GB, percent = 6.4% 0: [2023-03-16 18:54:57,040] [INFO] [utils.py:827:see_memory_usage] end initialize_optimizer 0: [2023-03-16 18:54:57,040] [INFO] [utils.py:828:see_memory_usage] MA 0.85 GB Max_MA 0.85 GB CA 1.13 GB Max_CA 1 GB 0: [2023-03-16 18:54:57,040] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 32.26 GB, percent = 6.4% 0: [2023-03-16 18:54:57,146] [INFO] [utils.py:827:see_memory_usage] end bf16_optimizer 0: [2023-03-16 18:54:57,146] [INFO] [utils.py:828:see_memory_usage] MA 0.85 GB Max_MA 0.85 GB CA 1.13 GB Max_CA 1 GB 0: [2023-03-16 18:54:57,147] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 32.26 GB, percent = 6.4% 0: [2023-03-16 18:54:57,147] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam 0: [2023-03-16 18:54:57,147] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed using client LR scheduler 0: [2023-03-16 18:54:57,147] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed LR Scheduler = 0: [2023-03-16 18:54:57,147] [INFO] [logging.py:68:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0, 0.0, 0.0], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 0: [2023-03-16 18:54:57,147] [INFO] [config.py:1007:print] DeepSpeedEngine configuration: 0: [2023-03-16 18:54:57,148] [INFO] [config.py:1011:print] activation_checkpointing_config { 0: "partition_activations": false, 0: "contiguous_memory_optimization": false, 0: "cpu_checkpointing": false, 0: "number_checkpoints": null, 0: "synchronize_checkpoint_boundary": false, 0: "profile": false 0: } 0: [2023-03-16 18:54:57,148] [INFO] [config.py:1011:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} 0: [2023-03-16 18:54:57,148] [INFO] [config.py:1011:print] amp_enabled .................. False 0: [2023-03-16 18:54:57,148] [INFO] [config.py:1011:print] amp_params ................... False 0: [2023-03-16 18:54:57,148] [INFO] [config.py:1011:print] autotuning_config ............ { 0: "enabled": false, 0: "start_step": null, 0: "end_step": null, 0: "metric_path": null, 0: "arg_mappings": null, 0: "metric": "throughput", 0: "model_info": null, 0: "results_dir": "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/autotuning_results", 0: "exps_dir": "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/autotuning_exps", 0: "overwrite": true, 0: "fast": true, 0: "start_profile_step": 3, 0: "end_profile_step": 5, 0: "tuner_type": "gridsearch", 0: "tuner_early_stopping": 5, 0: "tuner_num_trials": 50, 0: "model_info_path": null, 0: "mp_size": 1, 0: "max_train_batch_size": null, 0: "min_train_batch_size": 1, 0: "max_train_micro_batch_size_per_gpu": 1.024000e+03, 0: "min_train_micro_batch_size_per_gpu": 1, 0: "num_tuning_micro_batch_sizes": 3 0: } 0: [2023-03-16 18:54:57,148] [INFO] [config.py:1011:print] bfloat16_enabled ............. True 0: [2023-03-16 18:54:57,148] [INFO] [config.py:1011:print] checkpoint_parallel_write_pipeline False 0: [2023-03-16 18:54:57,148] [INFO] [config.py:1011:print] checkpoint_tag_validation_enabled True 0: [2023-03-16 18:54:57,148] [INFO] [config.py:1011:print] checkpoint_tag_validation_fail False 0: [2023-03-16 18:54:57,148] [INFO] [config.py:1011:print] comms_config ................. 0: [2023-03-16 18:54:57,148] [INFO] [config.py:1011:print] communication_data_type ...... None 0: [2023-03-16 18:54:57,148] [INFO] [config.py:1011:print] compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_pa 0: rameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}} 0: [2023-03-16 18:54:57,148] [INFO] [config.py:1011:print] curriculum_enabled ........... False 0: [2023-03-16 18:54:57,148] [INFO] [config.py:1011:print] curriculum_params ............ False 0: [2023-03-16 18:54:57,148] [INFO] [config.py:1011:print] dataloader_drop_last ......... False 0: [2023-03-16 18:54:57,148] [INFO] [config.py:1011:print] disable_allgather ............ False 0: [2023-03-16 18:54:57,148] [INFO] [config.py:1011:print] dump_state ................... False 0: [2023-03-16 18:54:57,148] [INFO] [config.py:1011:print] dynamic_loss_scale_args ...... None 0: [2023-03-16 18:54:57,148] [INFO] [config.py:1011:print] eigenvalue_enabled ........... False 0: [2023-03-16 18:54:57,148] [INFO] [config.py:1011:print] eigenvalue_gas_boundary_resolution 1 0: [2023-03-16 18:54:57,148] [INFO] [config.py:1011:print] eigenvalue_layer_name ........ bert.encoder.layer 0: [2023-03-16 18:54:57,148] [INFO] [config.py:1011:print] eigenvalue_layer_num ......... 0 0: [2023-03-16 18:54:57,148] [INFO] [config.py:1011:print] eigenvalue_max_iter .......... 100 0: [2023-03-16 18:54:57,148] [INFO] [config.py:1011:print] eigenvalue_stability ......... 1e-06 0: [2023-03-16 18:54:57,148] [INFO] [config.py:1011:print] eigenvalue_tol ............... 0.01 0: [2023-03-16 18:54:57,148] [INFO] [config.py:1011:print] eigenvalue_verbose ........... False 0: [2023-03-16 18:54:57,148] [INFO] [config.py:1011:print] elasticity_enabled ........... False 0: [2023-03-16 18:54:57,149] [INFO] [config.py:1011:print] flops_profiler_config ........ { 0: "enabled": false, 0: "profile_step": 1, 0: "module_depth": -1, 0: "top_modules": 1, 0: "detailed": true, 0: "output_file": null 0: } 0: [2023-03-16 18:54:57,149] [INFO] [config.py:1011:print] fp16_auto_cast ............... None 0: [2023-03-16 18:54:57,149] [INFO] [config.py:1011:print] fp16_enabled ................. False 0: [2023-03-16 18:54:57,149] [INFO] [config.py:1011:print] fp16_master_weights_and_gradients False 0: [2023-03-16 18:54:57,149] [INFO] [config.py:1011:print] global_rank .................. 0 0: [2023-03-16 18:54:57,149] [INFO] [config.py:1011:print] gradient_accumulation_steps .. 1 0: [2023-03-16 18:54:57,149] [INFO] [config.py:1011:print] gradient_clipping ............ 1.0 0: [2023-03-16 18:54:57,149] [INFO] [config.py:1011:print] gradient_predivide_factor .... 1.0 0: [2023-03-16 18:54:57,149] [INFO] [config.py:1011:print] initial_dynamic_scale ........ 1 0: [2023-03-16 18:54:57,149] [INFO] [config.py:1011:print] load_universal_checkpoint .... False 0: [2023-03-16 18:54:57,149] [INFO] [config.py:1011:print] loss_scale ................... 1.0 0: [2023-03-16 18:54:57,149] [INFO] [config.py:1011:print] memory_breakdown ............. False 0: [2023-03-16 18:54:57,149] [INFO] [config.py:1011:print] monitor_config ............... 0: [2023-03-16 18:54:57,149] [INFO] [config.py:1011:print] nebula_config ................ { 0: "enabled": false, 0: "persistent_storage_path": null, 0: "persistent_time_interval": 100, 0: "num_of_version_in_retention": 2, 0: "enable_nebula_load": true, 0: "load_path": null 0: } 0: [2023-03-16 18:54:57,149] [INFO] [config.py:1011:print] optimizer_legacy_fusion ...... False 0: [2023-03-16 18:54:57,149] [INFO] [config.py:1011:print] optimizer_name ............... None 0: [2023-03-16 18:54:57,149] [INFO] [config.py:1011:print] optimizer_params ............. None 0: [2023-03-16 18:54:57,149] [INFO] [config.py:1011:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} 0: [2023-03-16 18:54:57,149] [INFO] [config.py:1011:print] pld_enabled .................. False 0: [2023-03-16 18:54:57,149] [INFO] [config.py:1011:print] pld_params ................... False 0: [2023-03-16 18:54:57,149] [INFO] [config.py:1011:print] prescale_gradients ........... False 0: [2023-03-16 18:54:57,149] [INFO] [config.py:1011:print] scheduler_name ............... None 0: [2023-03-16 18:54:57,149] [INFO] [config.py:1011:print] scheduler_params ............. None 0: [2023-03-16 18:54:57,149] [INFO] [config.py:1011:print] sparse_attention ............. None 0: [2023-03-16 18:54:57,149] [INFO] [config.py:1011:print] sparse_gradients_enabled ..... False 0: [2023-03-16 18:54:57,149] [INFO] [config.py:1011:print] steps_per_print .............. 2000 0: [2023-03-16 18:54:57,149] [INFO] [config.py:1011:print] train_batch_size ............. 256 0: [2023-03-16 18:54:57,149] [INFO] [config.py:1011:print] train_micro_batch_size_per_gpu 4 0: [2023-03-16 18:54:57,149] [INFO] [config.py:1011:print] use_node_local_storage ....... False 0: [2023-03-16 18:54:57,149] [INFO] [config.py:1011:print] wall_clock_breakdown ......... False 0: [2023-03-16 18:54:57,149] [INFO] [config.py:1011:print] world_size ................... 64 0: [2023-03-16 18:54:57,149] [INFO] [config.py:1011:print] zero_allow_untested_optimizer False 0: [2023-03-16 18:54:57,149] [INFO] [config.py:1011:print] zero_config .................. stage=0 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=500000000 allgather_partitions=True allgather_bucket_size=500000000 overlap_comm=False load_from_fp32_weights=True elastic_checkpoint=False offload_param=None offload_optimizer=None sub_group_size=1000000000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=50000000 param_persistence_threshold=100000 model_persistence_threshold=9223372036854775807 max_live_parameters=1000000000 max_reuse_distance=1000000000 gather_16bit_weights_on_model_save=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False 0: [2023-03-16 18:54:57,149] [INFO] [config.py:1011:print] zero_enabled ................. False 0: [2023-03-16 18:54:57,149] [INFO] [config.py:1011:print] zero_optimization_stage ...... 0 0: [2023-03-16 18:54:57,150] [INFO] [config.py:996:print_user_config] json = { 0: "train_micro_batch_size_per_gpu": 4, 0: "train_batch_size": 256, 0: "gradient_clipping": 1.0, 0: "zero_optimization": { 0: "stage": 0 0: }, 0: "bf16": { 0: "enabled": true 0: }, 0: "steps_per_print": 2.000000e+03, 0: "wall_clock_breakdown": false 0: } 0: Time to load utils op: 0.00044345855712890625 seconds 0: [2023-03-16 18:54:57,150] [INFO] [engine.py:87:__init__] CONFIG: micro_batches=1 micro_batch_size=4 0: [2023-03-16 18:54:57,206] [INFO] [engine.py:145:__init__] RANK=0 STAGE=0 LAYERS=22 [0, 22) STAGE_PARAMS=146525952 (146.526M) TOTAL_PARAMS=146525952 (146.526M) UNIQUE_PARAMS=146525952 (146.526M) 0: [2023-03-16 18:54:57,213] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_146m60b100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 4: [2023-03-16 18:54:57,213] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_146m60b100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 4: [2023-03-16 18:54:57,213] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_146m60b100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 0: WARNING: could not find the metadata file checkpoints_146m60b100m 0: will not load any checkpoints and will start from random 4: [2023-03-16 18:54:57,213] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_146m60b100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 0: [2023-03-16 18:54:57,213] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_146m60b100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 6: [2023-03-16 18:54:57,213] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_146m60b100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 6: [2023-03-16 18:54:57,213] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_146m60b100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 7: [2023-03-16 18:54:57,213] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_146m60b100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 7: [2023-03-16 18:54:57,213] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_146m60b100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 7: [2023-03-16 18:54:57,213] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_146m60b100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 0: [2023-03-16 18:54:57,213] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_146m60b100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 6: [2023-03-16 18:54:57,213] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_146m60b100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 6: [2023-03-16 18:54:57,213] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_146m60b100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 4: [2023-03-16 18:54:57,213] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_146m60b100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 7: [2023-03-16 18:54:57,213] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_146m60b100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 0: [2023-03-16 18:54:57,213] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_146m60b100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 2: [2023-03-16 18:54:57,213] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_146m60b100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 2: [2023-03-16 18:54:57,213] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_146m60b100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 2: [2023-03-16 18:54:57,213] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_146m60b100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 5: [2023-03-16 18:54:57,213] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_146m60b100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 5: [2023-03-16 18:54:57,213] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_146m60b100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 5: [2023-03-16 18:54:57,213] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_146m60b100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 5: [2023-03-16 18:54:57,213] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_146m60b100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 0: [2023-03-16 18:54:57,213] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_146m60b100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 2: [2023-03-16 18:54:57,213] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_146m60b100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 6: [2023-03-16 18:54:57,213] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_146m60b100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 6: [2023-03-16 18:54:57,213] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_146m60b100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 4: [2023-03-16 18:54:57,213] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_146m60b100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 0: [2023-03-16 18:54:57,213] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_146m60b100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 3: [2023-03-16 18:54:57,213] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_146m60b100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 3: [2023-03-16 18:54:57,213] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_146m60b100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 3: [2023-03-16 18:54:57,213] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_146m60b100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 7: [2023-03-16 18:54:57,213] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_146m60b100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 7: [2023-03-16 18:54:57,213] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_146m60b100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 3: [2023-03-16 18:54:57,213] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_146m60b100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 4: [2023-03-16 18:54:57,213] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_146m60b100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 1: [2023-03-16 18:54:57,213] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_146m60b100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 1: [2023-03-16 18:54:57,213] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_146m60b100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 1: [2023-03-16 18:54:57,213] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_146m60b100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 1: [2023-03-16 18:54:57,213] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_146m60b100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 2: [2023-03-16 18:54:57,213] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_146m60b100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 5: [2023-03-16 18:54:57,213] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_146m60b100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 3: [2023-03-16 18:54:57,213] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_146m60b100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 0: [2023-03-16 18:54:57,213] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_146m60b100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 4: [2023-03-16 18:54:57,213] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_146m60b100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 2: [2023-03-16 18:54:57,213] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_146m60b100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 5: [2023-03-16 18:54:57,213] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_146m60b100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 0: [2023-03-16 18:54:57,213] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_146m60b100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 6: [2023-03-16 18:54:57,213] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_146m60b100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 3: [2023-03-16 18:54:57,213] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_146m60b100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 4: [2023-03-16 18:54:57,213] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_146m60b100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 7: [2023-03-16 18:54:57,213] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_146m60b100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 7: [2023-03-16 18:54:57,213] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_146m60b100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 1: [2023-03-16 18:54:57,213] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_146m60b100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 6: [2023-03-16 18:54:57,213] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_146m60b100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 1: [2023-03-16 18:54:57,213] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_146m60b100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 5: [2023-03-16 18:54:57,214] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_146m60b100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 5: [2023-03-16 18:54:57,214] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_146m60b100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 2: [2023-03-16 18:54:57,214] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_146m60b100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 2: [2023-03-16 18:54:57,214] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_146m60b100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 3: [2023-03-16 18:54:57,214] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_146m60b100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 3: [2023-03-16 18:54:57,214] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_146m60b100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 1: [2023-03-16 18:54:57,214] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_146m60b100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 1: [2023-03-16 18:54:57,214] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_146m60b100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 7: time (ms) | load-checkpoint: 7.59 0: estimated model parameters: 0.146525952 0: estimated model parameters without embeddings: 0.106319616 0: [after model, optimizer, and learning rate scheduler are built] datetime: 2023-03-16 18:54:57 0: > building train, validation, and test datasets ... 0: > datasets target sizes (minimum size): 0: train: 29492188 0: validation: 3072 0: test: 256 0: > building train, validation, and test datasets for GPT ... 0: > building dataset index ... 0: reading sizes... 0: reading pointers... 0: reading document index... 0: creating numpy buffer of mmap... 0: creating memory view of numpy buffer... 0: > finished creating indexed dataset in 0.007134 seconds 0: number of documents: 208931 0: > dataset split: 0: train: 0: document indices in [0, 208931) total of 208931 documents 0: > loading doc-idx mapping from /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_100M_text_document_train_indexmap_29492188ns_2048sl_1234s_doc_idx.npy 0: > loading sample-idx mapping from /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_100M_text_document_train_indexmap_29492188ns_2048sl_1234s_sample_idx.npy 0: > loading shuffle-idx mapping from /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_100M_text_document_train_indexmap_29492188ns_2048sl_1234s_shuffle_idx.npy 0: loaded indexed file in 0.095 seconds 0: total number of samples: 29526954 0: total number of epochs: 605 0: > building dataset index ... 0: reading sizes... 0: reading pointers... 0: reading document index... 0: creating numpy buffer of mmap... 0: creating memory view of numpy buffer... 0: > finished creating indexed dataset in 0.036595 seconds 0: number of documents: 364608 0: > dataset split: 0: validation: 0: document indices in [0, 364608) total of 364608 documents 0: > loading doc-idx mapping from /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document_validation_indexmap_3072ns_2048sl_1234s_doc_idx.npy 0: > loading sample-idx mapping from /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document_validation_indexmap_3072ns_2048sl_1234s_sample_idx.npy 0: > loading shuffle-idx mapping from /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document_validation_indexmap_3072ns_2048sl_1234s_shuffle_idx.npy 0: loaded indexed file in 0.073 seconds 0: total number of samples: 84978 0: total number of epochs: 1 0: > finished creating GPT datasets ... 0: [after dataloaders are built] datetime: 2023-03-16 18:55:11 0: done with setup ... 0: training ... 0: Number of parameters: [tensor rank - pipeline rank] w/ and w/o embeddings: 7: time (ms) | model-and-optimizer-setup: 17834.99 | train/valid/test-data-iterators-setup: 13987.47 0: [000-000] 0.1465B / 0.1063B 0: [before the start of training step] datetime: 2023-03-16 18:55:11 0: [2023-03-16 18:55:13,023] [INFO] [checkpointing.py:553:forward] Activation Checkpointing Information 0: [2023-03-16 18:55:13,024] [INFO] [checkpointing.py:554:forward] ----Partition Activations False, CPU CHECKPOINTING False 0: [2023-03-16 18:55:13,024] [INFO] [checkpointing.py:557:forward] ----contiguous Memory Checkpointing False with None total layers 0: [2023-03-16 18:55:13,024] [INFO] [checkpointing.py:560:forward] ----Synchronization False 0: [2023-03-16 18:55:13,024] [INFO] [checkpointing.py:561:forward] ----Profiling time in checkpointing False 0: [Rank 0] (after 100 iterations) memory (MB) | allocated: 2730.85986328125 | max allocated: 5304.484375 | reserved: 6818.0 | max reserved: 6818.0 7: iteration 100/ 115203 | consumed samples: 25600 | consumed tokens: 52428800 | elapsed time per iteration (s): 0.50 | learning rate: 1.736E-05 | global batch size: 256 | lm loss: 9.483098E+00 | grad norm: 1.476 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 513.760 | TFLOPs: 23.98 | 7: iteration 200/ 115203 | consumed samples: 51200 | consumed tokens: 104857600 | elapsed time per iteration (s): 0.39 | learning rate: 3.472E-05 | global batch size: 256 | lm loss: 7.663513E+00 | grad norm: 0.859 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 650.153 | TFLOPs: 30.35 | 7: iteration 300/ 115203 | consumed samples: 76800 | consumed tokens: 157286400 | elapsed time per iteration (s): 0.39 | learning rate: 5.208E-05 | global batch size: 256 | lm loss: 6.812288E+00 | grad norm: 0.883 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 661.277 | TFLOPs: 30.87 | 7: iteration 400/ 115203 | consumed samples: 102400 | consumed tokens: 209715200 | elapsed time per iteration (s): 0.39 | learning rate: 6.944E-05 | global batch size: 256 | lm loss: 6.466247E+00 | grad norm: 0.981 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 659.147 | TFLOPs: 30.77 | 7: iteration 500/ 115203 | consumed samples: 128000 | consumed tokens: 262144000 | elapsed time per iteration (s): 0.39 | learning rate: 8.680E-05 | global batch size: 256 | lm loss: 6.260057E+00 | grad norm: 0.718 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 657.380 | TFLOPs: 30.68 | 7: iteration 600/ 115203 | consumed samples: 153600 | consumed tokens: 314572800 | elapsed time per iteration (s): 0.39 | learning rate: 1.042E-04 | global batch size: 256 | lm loss: 6.076217E+00 | grad norm: 1.002 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 664.685 | TFLOPs: 31.03 | 7: iteration 700/ 115203 | consumed samples: 179200 | consumed tokens: 367001600 | elapsed time per iteration (s): 0.39 | learning rate: 1.215E-04 | global batch size: 256 | lm loss: 5.884296E+00 | grad norm: 1.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 663.365 | TFLOPs: 30.96 | 7: iteration 800/ 115203 | consumed samples: 204800 | consumed tokens: 419430400 | elapsed time per iteration (s): 0.38 | learning rate: 1.389E-04 | global batch size: 256 | lm loss: 5.699525E+00 | grad norm: 0.727 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 669.401 | TFLOPs: 31.25 | 7: iteration 900/ 115203 | consumed samples: 230400 | consumed tokens: 471859200 | elapsed time per iteration (s): 0.38 | learning rate: 1.562E-04 | global batch size: 256 | lm loss: 5.514153E+00 | grad norm: 1.137 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 666.398 | TFLOPs: 31.10 | 7: iteration 1000/ 115203 | consumed samples: 256000 | consumed tokens: 524288000 | elapsed time per iteration (s): 0.39 | learning rate: 1.736E-04 | global batch size: 256 | lm loss: 5.333961E+00 | grad norm: 1.034 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 659.441 | TFLOPs: 30.78 | 7: iteration 1100/ 115203 | consumed samples: 281600 | consumed tokens: 576716800 | elapsed time per iteration (s): 0.39 | learning rate: 1.910E-04 | global batch size: 256 | lm loss: 5.167001E+00 | grad norm: 1.190 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 662.906 | TFLOPs: 30.94 | 7: iteration 1200/ 115203 | consumed samples: 307200 | consumed tokens: 629145600 | elapsed time per iteration (s): 0.39 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 4.986725E+00 | grad norm: 0.839 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 654.624 | TFLOPs: 30.56 | 7: iteration 1300/ 115203 | consumed samples: 332800 | consumed tokens: 681574400 | elapsed time per iteration (s): 0.39 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 4.796720E+00 | grad norm: 0.932 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 659.373 | TFLOPs: 30.78 | 7: iteration 1400/ 115203 | consumed samples: 358400 | consumed tokens: 734003200 | elapsed time per iteration (s): 0.39 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 4.665944E+00 | grad norm: 0.741 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 664.253 | TFLOPs: 31.00 | 7: iteration 1500/ 115203 | consumed samples: 384000 | consumed tokens: 786432000 | elapsed time per iteration (s): 0.39 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 4.577286E+00 | grad norm: 0.638 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 660.747 | TFLOPs: 30.84 | 7: iteration 1600/ 115203 | consumed samples: 409600 | consumed tokens: 838860800 | elapsed time per iteration (s): 0.39 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 4.496804E+00 | grad norm: 0.543 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 664.463 | TFLOPs: 31.01 | 7: iteration 1700/ 115203 | consumed samples: 435200 | consumed tokens: 891289600 | elapsed time per iteration (s): 0.39 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 4.443250E+00 | grad norm: 0.526 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 658.233 | TFLOPs: 30.72 | 7: iteration 1800/ 115203 | consumed samples: 460800 | consumed tokens: 943718400 | elapsed time per iteration (s): 0.39 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 4.387316E+00 | grad norm: 0.668 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 664.796 | TFLOPs: 31.03 | 7: iteration 1900/ 115203 | consumed samples: 486400 | consumed tokens: 996147200 | elapsed time per iteration (s): 0.39 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 4.344477E+00 | grad norm: 0.546 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 662.655 | TFLOPs: 30.93 | 0: [2023-03-16 19:08:17,258] [INFO] [logging.py:68:log_dist] [Rank 0] step=2000, skipped=0, lr=[0.0001999754506631688, 0.0001999754506631688, 0.0001999754506631688], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 2000/ 115203 | consumed samples: 512000 | consumed tokens: 1048576000 | elapsed time per iteration (s): 0.39 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 4.292479E+00 | grad norm: 0.650 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 660.282 | TFLOPs: 30.82 | 0: steps: 2000 loss: 4.2848 iter time (s): 0.391 samples/sec: 655.049 7: iteration 2100/ 115203 | consumed samples: 537600 | consumed tokens: 1101004800 | elapsed time per iteration (s): 0.38 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 4.257440E+00 | grad norm: 0.547 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 665.785 | TFLOPs: 31.08 | 7: iteration 2200/ 115203 | consumed samples: 563200 | consumed tokens: 1153433600 | elapsed time per iteration (s): 0.39 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 4.225155E+00 | grad norm: 0.392 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 663.541 | TFLOPs: 30.97 | 7: iteration 2300/ 115203 | consumed samples: 588800 | consumed tokens: 1205862400 | elapsed time per iteration (s): 0.39 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 4.192719E+00 | grad norm: 0.511 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 656.594 | TFLOPs: 30.65 | 7: iteration 2400/ 115203 | consumed samples: 614400 | consumed tokens: 1258291200 | elapsed time per iteration (s): 0.38 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 4.165911E+00 | grad norm: 0.539 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 667.146 | TFLOPs: 31.14 | 7: iteration 2500/ 115203 | consumed samples: 640000 | consumed tokens: 1310720000 | elapsed time per iteration (s): 0.38 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 4.138177E+00 | grad norm: 0.465 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 668.456 | TFLOPs: 31.20 | 7: iteration 2600/ 115203 | consumed samples: 665600 | consumed tokens: 1363148800 | elapsed time per iteration (s): 0.38 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 4.110797E+00 | grad norm: 0.409 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 668.275 | TFLOPs: 31.19 | 7: iteration 2700/ 115203 | consumed samples: 691200 | consumed tokens: 1415577600 | elapsed time per iteration (s): 0.38 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 4.085222E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 667.156 | TFLOPs: 31.14 | 7: iteration 2800/ 115203 | consumed samples: 716800 | consumed tokens: 1468006400 | elapsed time per iteration (s): 0.38 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 4.065995E+00 | grad norm: 0.476 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 668.677 | TFLOPs: 31.21 | 7: iteration 2900/ 115203 | consumed samples: 742400 | consumed tokens: 1520435200 | elapsed time per iteration (s): 0.38 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 4.043055E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 674.648 | TFLOPs: 31.49 | 7: iteration 3000/ 115203 | consumed samples: 768000 | consumed tokens: 1572864000 | elapsed time per iteration (s): 0.39 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 4.019037E+00 | grad norm: 0.438 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 664.059 | TFLOPs: 31.00 | 7: iteration 3100/ 115203 | consumed samples: 793600 | consumed tokens: 1625292800 | elapsed time per iteration (s): 0.38 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 4.001087E+00 | grad norm: 0.415 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 669.125 | TFLOPs: 31.23 | 7: iteration 3200/ 115203 | consumed samples: 819200 | consumed tokens: 1677721600 | elapsed time per iteration (s): 0.38 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.980309E+00 | grad norm: 0.494 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 677.134 | TFLOPs: 31.61 | 7: iteration 3300/ 115203 | consumed samples: 844800 | consumed tokens: 1730150400 | elapsed time per iteration (s): 0.38 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 3.963033E+00 | grad norm: 0.396 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 668.579 | TFLOPs: 31.21 | 7: iteration 3400/ 115203 | consumed samples: 870400 | consumed tokens: 1782579200 | elapsed time per iteration (s): 0.38 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 3.945054E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 669.484 | TFLOPs: 31.25 | 7: iteration 3500/ 115203 | consumed samples: 896000 | consumed tokens: 1835008000 | elapsed time per iteration (s): 0.38 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 3.928683E+00 | grad norm: 0.418 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 672.849 | TFLOPs: 31.41 | 7: iteration 3600/ 115203 | consumed samples: 921600 | consumed tokens: 1887436800 | elapsed time per iteration (s): 0.38 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 3.914384E+00 | grad norm: 0.398 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 668.929 | TFLOPs: 31.22 | 7: iteration 3700/ 115203 | consumed samples: 947200 | consumed tokens: 1939865600 | elapsed time per iteration (s): 0.38 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 3.897822E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 676.237 | TFLOPs: 31.56 | 7: iteration 3800/ 115203 | consumed samples: 972800 | consumed tokens: 1992294400 | elapsed time per iteration (s): 0.38 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 3.887825E+00 | grad norm: 0.392 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 677.713 | TFLOPs: 31.63 | 7: iteration 3900/ 115203 | consumed samples: 998400 | consumed tokens: 2044723200 | elapsed time per iteration (s): 0.38 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 3.871966E+00 | grad norm: 0.406 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 669.952 | TFLOPs: 31.27 | 0: [2023-03-16 19:21:02,326] [INFO] [logging.py:68:log_dist] [Rank 0] step=4000, skipped=0, lr=[0.00019972320825211248, 0.00019972320825211248, 0.00019972320825211248], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 4000/ 115203 | consumed samples: 1024000 | consumed tokens: 2097152000 | elapsed time per iteration (s): 0.38 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 3.857705E+00 | grad norm: 0.481 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 670.811 | TFLOPs: 31.31 | 0: steps: 4000 loss: 3.8432 iter time (s): 0.381 samples/sec: 672.774 7: iteration 4100/ 115203 | consumed samples: 1049600 | consumed tokens: 2149580800 | elapsed time per iteration (s): 0.38 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 3.845897E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 672.436 | TFLOPs: 31.39 | 7: iteration 4200/ 115203 | consumed samples: 1075200 | consumed tokens: 2202009600 | elapsed time per iteration (s): 0.38 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 3.829057E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 676.901 | TFLOPs: 31.60 | 7: iteration 4300/ 115203 | consumed samples: 1100800 | consumed tokens: 2254438400 | elapsed time per iteration (s): 0.38 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 3.821615E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 676.740 | TFLOPs: 31.59 | 7: iteration 4400/ 115203 | consumed samples: 1126400 | consumed tokens: 2306867200 | elapsed time per iteration (s): 0.38 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 3.806991E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 674.857 | TFLOPs: 31.50 | 7: iteration 4500/ 115203 | consumed samples: 1152000 | consumed tokens: 2359296000 | elapsed time per iteration (s): 0.38 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 3.795804E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 679.442 | TFLOPs: 31.71 | 7: iteration 4600/ 115203 | consumed samples: 1177600 | consumed tokens: 2411724800 | elapsed time per iteration (s): 0.38 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 3.786408E+00 | grad norm: 0.399 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 675.562 | TFLOPs: 31.53 | 7: iteration 4700/ 115203 | consumed samples: 1203200 | consumed tokens: 2464153600 | elapsed time per iteration (s): 0.38 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 3.773548E+00 | grad norm: 0.427 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 678.550 | TFLOPs: 31.67 | 7: iteration 4800/ 115203 | consumed samples: 1228800 | consumed tokens: 2516582400 | elapsed time per iteration (s): 0.38 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 3.760621E+00 | grad norm: 0.394 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.433 | TFLOPs: 31.85 | 7: iteration 4900/ 115203 | consumed samples: 1254400 | consumed tokens: 2569011200 | elapsed time per iteration (s): 0.38 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 3.753471E+00 | grad norm: 0.407 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 677.367 | TFLOPs: 31.62 | 7: iteration 5000/ 115203 | consumed samples: 1280000 | consumed tokens: 2621440000 | elapsed time per iteration (s): 0.38 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 3.740575E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 680.565 | TFLOPs: 31.77 | 7: iteration 5100/ 115203 | consumed samples: 1305600 | consumed tokens: 2673868800 | elapsed time per iteration (s): 0.38 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 3.734401E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 678.200 | TFLOPs: 31.66 | 7: iteration 5200/ 115203 | consumed samples: 1331200 | consumed tokens: 2726297600 | elapsed time per iteration (s): 0.38 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 3.719905E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 680.918 | TFLOPs: 31.78 | 7: iteration 5300/ 115203 | consumed samples: 1356800 | consumed tokens: 2778726400 | elapsed time per iteration (s): 0.38 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 3.716728E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 678.043 | TFLOPs: 31.65 | 7: iteration 5400/ 115203 | consumed samples: 1382400 | consumed tokens: 2831155200 | elapsed time per iteration (s): 0.38 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 3.703205E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.210 | TFLOPs: 31.84 | 7: iteration 5500/ 115203 | consumed samples: 1408000 | consumed tokens: 2883584000 | elapsed time per iteration (s): 0.38 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 3.698201E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 680.096 | TFLOPs: 31.74 | 7: iteration 5600/ 115203 | consumed samples: 1433600 | consumed tokens: 2936012800 | elapsed time per iteration (s): 0.38 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 3.688683E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.599 | TFLOPs: 31.86 | 7: iteration 5700/ 115203 | consumed samples: 1459200 | consumed tokens: 2988441600 | elapsed time per iteration (s): 0.38 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 3.680162E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.271 | TFLOPs: 31.80 | 7: iteration 5800/ 115203 | consumed samples: 1484800 | consumed tokens: 3040870400 | elapsed time per iteration (s): 0.38 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 3.674152E+00 | grad norm: 0.411 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 680.854 | TFLOPs: 31.78 | 7: iteration 5900/ 115203 | consumed samples: 1510400 | consumed tokens: 3093299200 | elapsed time per iteration (s): 0.38 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 3.661846E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.611 | TFLOPs: 31.86 | 0: [2023-03-16 19:33:36,177] [INFO] [logging.py:68:log_dist] [Rank 0] step=6000, skipped=0, lr=[0.00019919872690019844, 0.00019919872690019844, 0.00019919872690019844], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 6000/ 115203 | consumed samples: 1536000 | consumed tokens: 3145728000 | elapsed time per iteration (s): 0.38 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 3.655221E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.161 | TFLOPs: 31.84 | 0: steps: 6000 loss: 3.6664 iter time (s): 0.375 samples/sec: 682.836 7: iteration 6100/ 115203 | consumed samples: 1561600 | consumed tokens: 3198156800 | elapsed time per iteration (s): 0.37 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 3.647129E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 683.382 | TFLOPs: 31.90 | 7: iteration 6200/ 115203 | consumed samples: 1587200 | consumed tokens: 3250585600 | elapsed time per iteration (s): 0.38 | learning rate: 1.991E-04 | global batch size: 256 | lm loss: 3.640304E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.657 | TFLOPs: 31.82 | 7: iteration 6300/ 115203 | consumed samples: 1612800 | consumed tokens: 3303014400 | elapsed time per iteration (s): 0.38 | learning rate: 1.991E-04 | global batch size: 256 | lm loss: 3.635871E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.994 | TFLOPs: 31.83 | 7: iteration 6400/ 115203 | consumed samples: 1638400 | consumed tokens: 3355443200 | elapsed time per iteration (s): 0.37 | learning rate: 1.991E-04 | global batch size: 256 | lm loss: 3.629253E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 683.532 | TFLOPs: 31.90 | 7: iteration 6500/ 115203 | consumed samples: 1664000 | consumed tokens: 3407872000 | elapsed time per iteration (s): 0.37 | learning rate: 1.990E-04 | global batch size: 256 | lm loss: 3.619674E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 683.596 | TFLOPs: 31.91 | 7: iteration 6600/ 115203 | consumed samples: 1689600 | consumed tokens: 3460300800 | elapsed time per iteration (s): 0.37 | learning rate: 1.990E-04 | global batch size: 256 | lm loss: 3.613089E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 683.450 | TFLOPs: 31.90 | 7: iteration 6700/ 115203 | consumed samples: 1715200 | consumed tokens: 3512729600 | elapsed time per iteration (s): 0.38 | learning rate: 1.990E-04 | global batch size: 256 | lm loss: 3.605975E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.521 | TFLOPs: 31.86 | 7: iteration 6800/ 115203 | consumed samples: 1740800 | consumed tokens: 3565158400 | elapsed time per iteration (s): 0.37 | learning rate: 1.989E-04 | global batch size: 256 | lm loss: 3.595848E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 683.752 | TFLOPs: 31.92 | 7: iteration 6900/ 115203 | consumed samples: 1766400 | consumed tokens: 3617587200 | elapsed time per iteration (s): 0.37 | learning rate: 1.989E-04 | global batch size: 256 | lm loss: 3.589080E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 683.706 | TFLOPs: 31.91 | 7: iteration 7000/ 115203 | consumed samples: 1792000 | consumed tokens: 3670016000 | elapsed time per iteration (s): 0.37 | learning rate: 1.988E-04 | global batch size: 256 | lm loss: 3.586764E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.700 | TFLOPs: 31.87 | 7: iteration 7100/ 115203 | consumed samples: 1817600 | consumed tokens: 3722444800 | elapsed time per iteration (s): 0.37 | learning rate: 1.988E-04 | global batch size: 256 | lm loss: 3.585169E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 685.246 | TFLOPs: 31.98 | 7: iteration 7200/ 115203 | consumed samples: 1843200 | consumed tokens: 3774873600 | elapsed time per iteration (s): 0.37 | learning rate: 1.988E-04 | global batch size: 256 | lm loss: 3.578696E+00 | grad norm: 0.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 683.361 | TFLOPs: 31.90 | 7: iteration 7300/ 115203 | consumed samples: 1868800 | consumed tokens: 3827302400 | elapsed time per iteration (s): 0.37 | learning rate: 1.987E-04 | global batch size: 256 | lm loss: 3.568722E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 683.802 | TFLOPs: 31.92 | 7: iteration 7400/ 115203 | consumed samples: 1894400 | consumed tokens: 3879731200 | elapsed time per iteration (s): 0.37 | learning rate: 1.987E-04 | global batch size: 256 | lm loss: 3.570744E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 683.932 | TFLOPs: 31.92 | 7: iteration 7500/ 115203 | consumed samples: 1920000 | consumed tokens: 3932160000 | elapsed time per iteration (s): 0.37 | learning rate: 1.986E-04 | global batch size: 256 | lm loss: 3.558316E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 684.260 | TFLOPs: 31.94 | 7: iteration 7600/ 115203 | consumed samples: 1945600 | consumed tokens: 3984588800 | elapsed time per iteration (s): 0.37 | learning rate: 1.986E-04 | global batch size: 256 | lm loss: 3.550464E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 685.474 | TFLOPs: 32.00 | 7: iteration 7700/ 115203 | consumed samples: 1971200 | consumed tokens: 4037017600 | elapsed time per iteration (s): 0.37 | learning rate: 1.985E-04 | global batch size: 256 | lm loss: 3.548856E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 684.862 | TFLOPs: 31.97 | 7: iteration 7800/ 115203 | consumed samples: 1996800 | consumed tokens: 4089446400 | elapsed time per iteration (s): 0.37 | learning rate: 1.985E-04 | global batch size: 256 | lm loss: 3.541788E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 683.798 | TFLOPs: 31.92 | 7: iteration 7900/ 115203 | consumed samples: 2022400 | consumed tokens: 4141875200 | elapsed time per iteration (s): 0.37 | learning rate: 1.984E-04 | global batch size: 256 | lm loss: 3.538268E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 683.628 | TFLOPs: 31.91 | 0: [2023-03-16 19:46:05,226] [INFO] [logging.py:68:log_dist] [Rank 0] step=8000, skipped=0, lr=[0.00019840359799331808, 0.00019840359799331808, 0.00019840359799331808], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 8000/ 115203 | consumed samples: 2048000 | consumed tokens: 4194304000 | elapsed time per iteration (s): 0.38 | learning rate: 1.984E-04 | global batch size: 256 | lm loss: 3.531741E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.054 | TFLOPs: 31.84 | 0: steps: 8000 loss: 3.5197 iter time (s): 0.373 samples/sec: 687.086 7: iteration 8100/ 115203 | consumed samples: 2073600 | consumed tokens: 4246732800 | elapsed time per iteration (s): 0.38 | learning rate: 1.984E-04 | global batch size: 256 | lm loss: 3.529619E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 679.975 | TFLOPs: 31.74 | 7: iteration 8200/ 115203 | consumed samples: 2099200 | consumed tokens: 4299161600 | elapsed time per iteration (s): 0.38 | learning rate: 1.983E-04 | global batch size: 256 | lm loss: 3.520056E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 680.483 | TFLOPs: 31.76 | 7: iteration 8300/ 115203 | consumed samples: 2124800 | consumed tokens: 4351590400 | elapsed time per iteration (s): 0.38 | learning rate: 1.983E-04 | global batch size: 256 | lm loss: 3.517895E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.210 | TFLOPs: 31.80 | 7: iteration 8400/ 115203 | consumed samples: 2150400 | consumed tokens: 4404019200 | elapsed time per iteration (s): 0.38 | learning rate: 1.982E-04 | global batch size: 256 | lm loss: 3.514415E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.663 | TFLOPs: 31.82 | 7: iteration 8500/ 115203 | consumed samples: 2176000 | consumed tokens: 4456448000 | elapsed time per iteration (s): 0.37 | learning rate: 1.982E-04 | global batch size: 256 | lm loss: 3.509266E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.922 | TFLOPs: 31.88 | 7: iteration 8600/ 115203 | consumed samples: 2201600 | consumed tokens: 4508876800 | elapsed time per iteration (s): 0.38 | learning rate: 1.981E-04 | global batch size: 256 | lm loss: 3.507711E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 679.770 | TFLOPs: 31.73 | 7: iteration 8700/ 115203 | consumed samples: 2227200 | consumed tokens: 4561305600 | elapsed time per iteration (s): 0.37 | learning rate: 1.981E-04 | global batch size: 256 | lm loss: 3.497705E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.913 | TFLOPs: 31.88 | 7: iteration 8800/ 115203 | consumed samples: 2252800 | consumed tokens: 4613734400 | elapsed time per iteration (s): 0.38 | learning rate: 1.980E-04 | global batch size: 256 | lm loss: 3.494282E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 680.511 | TFLOPs: 31.76 | 7: iteration 8900/ 115203 | consumed samples: 2278400 | consumed tokens: 4666163200 | elapsed time per iteration (s): 0.38 | learning rate: 1.980E-04 | global batch size: 256 | lm loss: 3.488440E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.400 | TFLOPs: 31.81 | 7: iteration 9000/ 115203 | consumed samples: 2304000 | consumed tokens: 4718592000 | elapsed time per iteration (s): 0.38 | learning rate: 1.979E-04 | global batch size: 256 | lm loss: 3.488786E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.914 | TFLOPs: 31.83 | 7: iteration 9100/ 115203 | consumed samples: 2329600 | consumed tokens: 4771020800 | elapsed time per iteration (s): 0.38 | learning rate: 1.979E-04 | global batch size: 256 | lm loss: 3.484019E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.274 | TFLOPs: 31.85 | 7: iteration 9200/ 115203 | consumed samples: 2355200 | consumed tokens: 4823449600 | elapsed time per iteration (s): 0.38 | learning rate: 1.978E-04 | global batch size: 256 | lm loss: 3.477020E+00 | grad norm: 0.265 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.446 | TFLOPs: 31.85 | 7: iteration 9300/ 115203 | consumed samples: 2380800 | consumed tokens: 4875878400 | elapsed time per iteration (s): 0.38 | learning rate: 1.977E-04 | global batch size: 256 | lm loss: 3.471386E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.732 | TFLOPs: 31.82 | 7: iteration 9400/ 115203 | consumed samples: 2406400 | consumed tokens: 4928307200 | elapsed time per iteration (s): 0.38 | learning rate: 1.977E-04 | global batch size: 256 | lm loss: 3.469727E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.965 | TFLOPs: 31.83 | 7: iteration 9500/ 115203 | consumed samples: 2432000 | consumed tokens: 4980736000 | elapsed time per iteration (s): 0.38 | learning rate: 1.976E-04 | global batch size: 256 | lm loss: 3.462975E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.227 | TFLOPs: 31.84 | 7: iteration 9600/ 115203 | consumed samples: 2457600 | consumed tokens: 5033164800 | elapsed time per iteration (s): 0.37 | learning rate: 1.976E-04 | global batch size: 256 | lm loss: 3.461214E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.910 | TFLOPs: 31.88 | 7: iteration 9700/ 115203 | consumed samples: 2483200 | consumed tokens: 5085593600 | elapsed time per iteration (s): 0.37 | learning rate: 1.975E-04 | global batch size: 256 | lm loss: 3.459602E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 683.524 | TFLOPs: 31.90 | 7: iteration 9800/ 115203 | consumed samples: 2508800 | consumed tokens: 5138022400 | elapsed time per iteration (s): 0.37 | learning rate: 1.975E-04 | global batch size: 256 | lm loss: 3.453040E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.752 | TFLOPs: 31.87 | 7: iteration 9900/ 115203 | consumed samples: 2534400 | consumed tokens: 5190451200 | elapsed time per iteration (s): 0.38 | learning rate: 1.974E-04 | global batch size: 256 | lm loss: 3.448749E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.909 | TFLOPs: 31.83 | 0: [2023-03-16 19:58:36,855] [INFO] [logging.py:68:log_dist] [Rank 0] step=10000, skipped=0, lr=[0.00019734023411853413, 0.00019734023411853413, 0.00019734023411853413], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 10000/ 115203 | consumed samples: 2560000 | consumed tokens: 5242880000 | elapsed time per iteration (s): 0.38 | learning rate: 1.973E-04 | global batch size: 256 | lm loss: 3.448846E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 669.497 | TFLOPs: 31.25 | 0: steps: 10000 loss: 3.4651 iter time (s): 0.374 samples/sec: 684.977 7: ------------------------------------------------------------------------------------------------ 7: validation loss at iteration 10000 | lm loss value: 3.805312E+00 | lm loss PPL: 4.493925E+01 | 7: ------------------------------------------------------------------------------------------------ 0: saving checkpoint at iteration 10000 to checkpoints_146m60b100m 0: [2023-03-16 19:58:36,978] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step10000 is begin to save! 0: [2023-03-16 19:58:37,667] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step10000/layer_01-model_00-model_states.pt... 0: [2023-03-16 19:58:37,763] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step10000/layer_01-model_00-model_states.pt. 0: [2023-03-16 19:58:37,764] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step10000/layer_03-model_00-model_states.pt... 0: [2023-03-16 19:58:37,782] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step10000/layer_03-model_00-model_states.pt. 0: [2023-03-16 19:58:37,782] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step10000/layer_04-model_00-model_states.pt... 0: [2023-03-16 19:58:37,799] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step10000/layer_04-model_00-model_states.pt. 0: [2023-03-16 19:58:37,799] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step10000/layer_05-model_00-model_states.pt... 0: [2023-03-16 19:58:37,815] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step10000/layer_05-model_00-model_states.pt. 0: [2023-03-16 19:58:37,815] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step10000/layer_06-model_00-model_states.pt... 0: [2023-03-16 19:58:37,831] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step10000/layer_06-model_00-model_states.pt. 0: [2023-03-16 19:58:37,831] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step10000/layer_07-model_00-model_states.pt... 0: [2023-03-16 19:58:37,846] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step10000/layer_07-model_00-model_states.pt. 0: [2023-03-16 19:58:37,846] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step10000/layer_08-model_00-model_states.pt... 0: [2023-03-16 19:58:37,862] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step10000/layer_08-model_00-model_states.pt. 0: [2023-03-16 19:58:37,862] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step10000/layer_09-model_00-model_states.pt... 0: [2023-03-16 19:58:37,878] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step10000/layer_09-model_00-model_states.pt. 0: [2023-03-16 19:58:37,878] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step10000/layer_10-model_00-model_states.pt... 0: [2023-03-16 19:58:37,894] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step10000/layer_10-model_00-model_states.pt. 0: [2023-03-16 19:58:37,894] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step10000/layer_11-model_00-model_states.pt... 0: [2023-03-16 19:58:37,910] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step10000/layer_11-model_00-model_states.pt. 0: [2023-03-16 19:58:37,910] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step10000/layer_12-model_00-model_states.pt... 0: [2023-03-16 19:58:37,925] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step10000/layer_12-model_00-model_states.pt. 0: [2023-03-16 19:58:37,926] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step10000/layer_13-model_00-model_states.pt... 0: [2023-03-16 19:58:37,941] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step10000/layer_13-model_00-model_states.pt. 0: [2023-03-16 19:58:37,941] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step10000/layer_14-model_00-model_states.pt... 0: [2023-03-16 19:58:37,957] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step10000/layer_14-model_00-model_states.pt. 0: [2023-03-16 19:58:37,958] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step10000/layer_15-model_00-model_states.pt... 0: [2023-03-16 19:58:37,974] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step10000/layer_15-model_00-model_states.pt. 0: [2023-03-16 19:58:37,974] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step10000/layer_16-model_00-model_states.pt... 0: [2023-03-16 19:58:37,989] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step10000/layer_16-model_00-model_states.pt. 0: [2023-03-16 19:58:37,990] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step10000/layer_17-model_00-model_states.pt... 0: [2023-03-16 19:58:38,005] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step10000/layer_17-model_00-model_states.pt. 0: [2023-03-16 19:58:38,005] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step10000/layer_19-model_00-model_states.pt... 0: [2023-03-16 19:58:38,007] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step10000/layer_19-model_00-model_states.pt. 0: [2023-03-16 19:58:38,007] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_146m60b100m/global_step10000/mp_rank_00_model_states.pt 0: [2023-03-16 19:58:38,008] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step10000/mp_rank_00_model_states.pt... 0: [2023-03-16 19:58:38,011] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step10000/mp_rank_00_model_states.pt. 0: [2023-03-16 19:58:38,039] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-16 19:58:38,039] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-16 19:58:38,039] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2023-03-16 19:58:38,039] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-16 19:58:38,039] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 3: [2023-03-16 19:58:38,039] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-16 19:58:38,039] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-16 19:58:38,039] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 4: [2023-03-16 19:58:38,039] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2023-03-16 19:58:38,039] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 0: [2023-03-16 19:58:38,039] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 2: [2023-03-16 19:58:38,039] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-16 19:58:38,039] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2023-03-16 19:58:38,039] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 0: [2023-03-16 19:58:38,039] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 6: [2023-03-16 19:58:38,039] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-16 19:58:38,039] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-16 19:58:38,039] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-16 19:58:38,039] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 3: [2023-03-16 19:58:38,039] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-16 19:58:38,039] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-16 19:58:38,039] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 4: [2023-03-16 19:58:38,039] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 5: [2023-03-16 19:58:38,039] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-16 19:58:38,039] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-16 19:58:38,039] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 7: [2023-03-16 19:58:38,039] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-16 19:58:38,039] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-16 19:58:38,039] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-16 19:58:38,039] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-16 19:58:38,039] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 1: [2023-03-16 19:58:38,039] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-16 19:58:38,039] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-16 19:58:38,039] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 0: [2023-03-16 19:58:38,039] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 2: [2023-03-16 19:58:38,039] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-16 19:58:38,039] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-16 19:58:38,039] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 6: [2023-03-16 19:58:38,039] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-16 19:58:38,039] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2023-03-16 19:58:38,039] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 3: [2023-03-16 19:58:38,039] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-16 19:58:38,039] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 4: [2023-03-16 19:58:38,039] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-16 19:58:38,039] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-16 19:58:38,039] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-16 19:58:38,039] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 5: [2023-03-16 19:58:38,039] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-16 19:58:38,039] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-16 19:58:38,039] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-03-16 19:58:38,039] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-16 19:58:38,039] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 7: [2023-03-16 19:58:38,039] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-16 19:58:38,039] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 1: [2023-03-16 19:58:38,039] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-16 19:58:38,039] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-16 19:58:38,039] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2023-03-16 19:58:38,039] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2023-03-16 19:58:38,039] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 2: [2023-03-16 19:58:38,039] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-16 19:58:38,039] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 6: [2023-03-16 19:58:38,039] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 4: [2023-03-16 19:58:38,039] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 7: [2023-03-16 19:58:38,039] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 0: [2023-03-16 19:58:38,075] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 0: [2023-03-16 19:58:38,075] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-03-16 19:58:38,075] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-16 19:58:38,075] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 0: [2023-03-16 19:58:38,076] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-03-16 19:58:38,076] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-16 19:58:38,076] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 0: [2023-03-16 19:58:38,077] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2023-03-16 19:58:38,077] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-03-16 19:58:38,077] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 0: [2023-03-16 19:58:38,078] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-16 19:58:38,078] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-03-16 19:58:38,078] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 0: [2023-03-16 19:58:38,078] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-03-16 19:58:38,078] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-03-16 19:58:38,078] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 0: [2023-03-16 19:58:38,079] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2023-03-16 19:58:38,079] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-16 19:58:38,079] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 3: [2023-03-16 19:58:38,081] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2023-03-16 19:58:38,081] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-03-16 19:58:38,081] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-03-16 19:58:38,081] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-16 19:58:38,081] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-03-16 19:58:38,081] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-16 19:58:38,081] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 3: [2023-03-16 19:58:38,081] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 3: [2023-03-16 19:58:38,081] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 3: [2023-03-16 19:58:38,084] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2023-03-16 19:58:38,084] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2023-03-16 19:58:38,084] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 3: [2023-03-16 19:58:38,085] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-16 19:58:38,085] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-16 19:58:38,085] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 0: [2023-03-16 19:58:38,088] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-03-16 19:58:38,088] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-03-16 19:58:38,088] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 3: [2023-03-16 19:58:38,089] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-03-16 19:58:38,089] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2023-03-16 19:58:38,089] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-16 19:58:38,089] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 3: [2023-03-16 19:58:38,089] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2023-03-16 19:58:38,089] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 3: [2023-03-16 19:58:38,089] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2023-03-16 19:58:38,089] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-03-16 19:58:38,089] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 4: [2023-03-16 19:58:38,091] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-03-16 19:58:38,091] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2023-03-16 19:58:38,091] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-03-16 19:58:38,091] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2023-03-16 19:58:38,091] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-16 19:58:38,091] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-16 19:58:38,091] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2023-03-16 19:58:38,091] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-03-16 19:58:38,091] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 4: [2023-03-16 19:58:38,091] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 4: [2023-03-16 19:58:38,091] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 4: [2023-03-16 19:58:38,091] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 4: [2023-03-16 19:58:38,092] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-03-16 19:58:38,092] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-16 19:58:38,092] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 4: [2023-03-16 19:58:38,092] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-03-16 19:58:38,092] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-03-16 19:58:38,092] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 4: [2023-03-16 19:58:38,092] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2023-03-16 19:58:38,092] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2023-03-16 19:58:38,092] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 4: [2023-03-16 19:58:38,092] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-03-16 19:58:38,092] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-16 19:58:38,092] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 5: [2023-03-16 19:58:38,090] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-03-16 19:58:38,090] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-03-16 19:58:38,090] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 5: [2023-03-16 19:58:38,091] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-03-16 19:58:38,091] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-03-16 19:58:38,091] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 5: [2023-03-16 19:58:38,091] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2023-03-16 19:58:38,091] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2023-03-16 19:58:38,091] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 5: [2023-03-16 19:58:38,092] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2023-03-16 19:58:38,092] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-03-16 19:58:38,092] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-03-16 19:58:38,092] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 5: [2023-03-16 19:58:38,092] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-03-16 19:58:38,092] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 5: [2023-03-16 19:58:38,094] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-03-16 19:58:38,094] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2023-03-16 19:58:38,094] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2023-03-16 19:58:38,094] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-03-16 19:58:38,094] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-03-16 19:58:38,094] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-16 19:58:38,094] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 5: [2023-03-16 19:58:38,094] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 5: [2023-03-16 19:58:38,094] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 1: [2023-03-16 19:58:38,096] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2023-03-16 19:58:38,096] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-03-16 19:58:38,096] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2023-03-16 19:58:38,096] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2023-03-16 19:58:38,096] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 1: [2023-03-16 19:58:38,096] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-03-16 19:58:38,096] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2023-03-16 19:58:38,096] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 1: [2023-03-16 19:58:38,096] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 1: [2023-03-16 19:58:38,096] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-03-16 19:58:38,096] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-03-16 19:58:38,096] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-03-16 19:58:38,097] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-16 19:58:38,097] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-03-16 19:58:38,096] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-16 19:58:38,096] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-03-16 19:58:38,097] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 1: [2023-03-16 19:58:38,097] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 1: [2023-03-16 19:58:38,097] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-16 19:58:38,097] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-03-16 19:58:38,097] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-16 19:58:38,097] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 1: [2023-03-16 19:58:38,097] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 1: [2023-03-16 19:58:38,097] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 0: [2023-03-16 19:58:38,106] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-16 19:58:38,106] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 2: [2023-03-16 19:58:38,115] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-03-16 19:58:38,115] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-03-16 19:58:38,115] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-16 19:58:38,115] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-16 19:58:38,115] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2023-03-16 19:58:38,115] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2023-03-16 19:58:38,115] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 7: [2023-03-16 19:58:38,115] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 2: [2023-03-16 19:58:38,115] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-03-16 19:58:38,115] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 7: [2023-03-16 19:58:38,115] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2023-03-16 19:58:38,115] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 2: [2023-03-16 19:58:38,115] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 7: [2023-03-16 19:58:38,115] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2023-03-16 19:58:38,115] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 2: [2023-03-16 19:58:38,115] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 7: [2023-03-16 19:58:38,115] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2023-03-16 19:58:38,115] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 2: [2023-03-16 19:58:38,115] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-16 19:58:38,115] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 2: [2023-03-16 19:58:38,115] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-16 19:58:38,115] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2023-03-16 19:58:38,115] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 7: [2023-03-16 19:58:38,115] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-16 19:58:38,115] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 7: [2023-03-16 19:58:38,115] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-03-16 19:58:38,115] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 2: [2023-03-16 19:58:38,115] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-03-16 19:58:38,115] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 7: [2023-03-16 19:58:38,115] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-03-16 19:58:38,115] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 2: [2023-03-16 19:58:38,115] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 2: [2023-03-16 19:58:38,115] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 2: [2023-03-16 19:58:38,115] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 2: [2023-03-16 19:58:38,115] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 7: [2023-03-16 19:58:38,115] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2023-03-16 19:58:38,115] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 2: [2023-03-16 19:58:38,115] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 7: [2023-03-16 19:58:38,115] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 7: [2023-03-16 19:58:38,115] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 2: [2023-03-16 19:58:38,115] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 7: [2023-03-16 19:58:38,115] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-16 19:58:38,115] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 7: [2023-03-16 19:58:38,115] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 6: [2023-03-16 19:58:38,115] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2023-03-16 19:58:38,115] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 7: [2023-03-16 19:58:38,115] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 7: [2023-03-16 19:58:38,115] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 7: [2023-03-16 19:58:38,116] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 7: [2023-03-16 19:58:38,116] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 6: [2023-03-16 19:58:38,115] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-03-16 19:58:38,115] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2023-03-16 19:58:38,115] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-03-16 19:58:38,115] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2023-03-16 19:58:38,115] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-03-16 19:58:38,116] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-16 19:58:38,116] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2023-03-16 19:58:38,116] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-16 19:58:38,116] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-03-16 19:58:38,116] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-16 19:58:38,116] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2023-03-16 19:58:38,116] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 6: [2023-03-16 19:58:38,116] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-16 19:58:38,116] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2023-03-16 19:58:38,116] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 6: [2023-03-16 19:58:38,116] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 6: [2023-03-16 19:58:38,116] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 6: [2023-03-16 19:58:38,116] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 6: [2023-03-16 19:58:38,116] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step10000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-03-16 19:58:38,116] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 6: [2023-03-16 19:58:38,116] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 6: [2023-03-16 19:58:38,116] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 0: successfully saved checkpoint at iteration 10000 to checkpoints_146m60b100m 7: time (ms) | save-checkpoint: 1142.84 7: iteration 10100/ 115203 | consumed samples: 2585600 | consumed tokens: 5295308800 | elapsed time per iteration (s): 0.39 | learning rate: 1.973E-04 | global batch size: 256 | lm loss: 3.443184E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 658.092 | TFLOPs: 30.72 | 7: iteration 10200/ 115203 | consumed samples: 2611200 | consumed tokens: 5347737600 | elapsed time per iteration (s): 0.39 | learning rate: 1.972E-04 | global batch size: 256 | lm loss: 3.441904E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 664.465 | TFLOPs: 31.01 | 7: iteration 10300/ 115203 | consumed samples: 2636800 | consumed tokens: 5400166400 | elapsed time per iteration (s): 0.39 | learning rate: 1.972E-04 | global batch size: 256 | lm loss: 3.436436E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 655.750 | TFLOPs: 30.61 | 7: iteration 10400/ 115203 | consumed samples: 2662400 | consumed tokens: 5452595200 | elapsed time per iteration (s): 0.38 | learning rate: 1.971E-04 | global batch size: 256 | lm loss: 3.435470E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 665.103 | TFLOPs: 31.04 | 7: iteration 10500/ 115203 | consumed samples: 2688000 | consumed tokens: 5505024000 | elapsed time per iteration (s): 0.38 | learning rate: 1.970E-04 | global batch size: 256 | lm loss: 3.429931E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 665.936 | TFLOPs: 31.08 | 7: iteration 10600/ 115203 | consumed samples: 2713600 | consumed tokens: 5557452800 | elapsed time per iteration (s): 0.38 | learning rate: 1.970E-04 | global batch size: 256 | lm loss: 3.427238E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 674.408 | TFLOPs: 31.48 | 7: iteration 10700/ 115203 | consumed samples: 2739200 | consumed tokens: 5609881600 | elapsed time per iteration (s): 0.38 | learning rate: 1.969E-04 | global batch size: 256 | lm loss: 3.419451E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 673.720 | TFLOPs: 31.45 | 7: iteration 10800/ 115203 | consumed samples: 2764800 | consumed tokens: 5662310400 | elapsed time per iteration (s): 0.38 | learning rate: 1.968E-04 | global batch size: 256 | lm loss: 3.420304E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 673.664 | TFLOPs: 31.44 | 7: iteration 10900/ 115203 | consumed samples: 2790400 | consumed tokens: 5714739200 | elapsed time per iteration (s): 0.38 | learning rate: 1.968E-04 | global batch size: 256 | lm loss: 3.419977E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 677.347 | TFLOPs: 31.62 | 7: iteration 11000/ 115203 | consumed samples: 2816000 | consumed tokens: 5767168000 | elapsed time per iteration (s): 0.38 | learning rate: 1.967E-04 | global batch size: 256 | lm loss: 3.412847E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 677.959 | TFLOPs: 31.64 | 7: iteration 11100/ 115203 | consumed samples: 2841600 | consumed tokens: 5819596800 | elapsed time per iteration (s): 0.38 | learning rate: 1.966E-04 | global batch size: 256 | lm loss: 3.413156E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 677.463 | TFLOPs: 31.62 | 7: iteration 11200/ 115203 | consumed samples: 2867200 | consumed tokens: 5872025600 | elapsed time per iteration (s): 0.38 | learning rate: 1.966E-04 | global batch size: 256 | lm loss: 3.405965E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 677.363 | TFLOPs: 31.62 | 7: iteration 11300/ 115203 | consumed samples: 2892800 | consumed tokens: 5924454400 | elapsed time per iteration (s): 0.38 | learning rate: 1.965E-04 | global batch size: 256 | lm loss: 3.403853E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 676.212 | TFLOPs: 31.56 | 7: iteration 11400/ 115203 | consumed samples: 2918400 | consumed tokens: 5976883200 | elapsed time per iteration (s): 0.38 | learning rate: 1.964E-04 | global batch size: 256 | lm loss: 3.401856E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 677.778 | TFLOPs: 31.64 | 7: iteration 11500/ 115203 | consumed samples: 2944000 | consumed tokens: 6029312000 | elapsed time per iteration (s): 0.38 | learning rate: 1.964E-04 | global batch size: 256 | lm loss: 3.401828E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 679.256 | TFLOPs: 31.71 | 7: iteration 11600/ 115203 | consumed samples: 2969600 | consumed tokens: 6081740800 | elapsed time per iteration (s): 0.38 | learning rate: 1.963E-04 | global batch size: 256 | lm loss: 3.393804E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 677.303 | TFLOPs: 31.61 | 7: iteration 11700/ 115203 | consumed samples: 2995200 | consumed tokens: 6134169600 | elapsed time per iteration (s): 0.38 | learning rate: 1.962E-04 | global batch size: 256 | lm loss: 3.392207E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 678.192 | TFLOPs: 31.66 | 7: iteration 11800/ 115203 | consumed samples: 3020800 | consumed tokens: 6186598400 | elapsed time per iteration (s): 0.38 | learning rate: 1.962E-04 | global batch size: 256 | lm loss: 3.394225E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 673.273 | TFLOPs: 31.43 | 7: iteration 11900/ 115203 | consumed samples: 3046400 | consumed tokens: 6239027200 | elapsed time per iteration (s): 0.38 | learning rate: 1.961E-04 | global batch size: 256 | lm loss: 3.388298E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 677.269 | TFLOPs: 31.61 | 0: [2023-03-16 20:11:17,666] [INFO] [logging.py:68:log_dist] [Rank 0] step=12000, skipped=0, lr=[0.0001960118617437879, 0.0001960118617437879, 0.0001960118617437879], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 12000/ 115203 | consumed samples: 3072000 | consumed tokens: 6291456000 | elapsed time per iteration (s): 0.38 | learning rate: 1.960E-04 | global batch size: 256 | lm loss: 3.381953E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 680.264 | TFLOPs: 31.75 | 0: steps: 12000 loss: 3.3659 iter time (s): 0.378 samples/sec: 676.598 7: iteration 12100/ 115203 | consumed samples: 3097600 | consumed tokens: 6343884800 | elapsed time per iteration (s): 0.38 | learning rate: 1.959E-04 | global batch size: 256 | lm loss: 3.382831E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 678.139 | TFLOPs: 31.65 | 7: iteration 12200/ 115203 | consumed samples: 3123200 | consumed tokens: 6396313600 | elapsed time per iteration (s): 0.38 | learning rate: 1.959E-04 | global batch size: 256 | lm loss: 3.383112E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 677.700 | TFLOPs: 31.63 | 7: iteration 12300/ 115203 | consumed samples: 3148800 | consumed tokens: 6448742400 | elapsed time per iteration (s): 0.38 | learning rate: 1.958E-04 | global batch size: 256 | lm loss: 3.374815E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 677.557 | TFLOPs: 31.63 | 7: iteration 12400/ 115203 | consumed samples: 3174400 | consumed tokens: 6501171200 | elapsed time per iteration (s): 0.38 | learning rate: 1.957E-04 | global batch size: 256 | lm loss: 3.377033E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 678.427 | TFLOPs: 31.67 | 7: iteration 12500/ 115203 | consumed samples: 3200000 | consumed tokens: 6553600000 | elapsed time per iteration (s): 0.38 | learning rate: 1.956E-04 | global batch size: 256 | lm loss: 3.372816E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 680.429 | TFLOPs: 31.76 | 7: iteration 12600/ 115203 | consumed samples: 3225600 | consumed tokens: 6606028800 | elapsed time per iteration (s): 0.38 | learning rate: 1.956E-04 | global batch size: 256 | lm loss: 3.365264E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 678.318 | TFLOPs: 31.66 | 7: iteration 12700/ 115203 | consumed samples: 3251200 | consumed tokens: 6658457600 | elapsed time per iteration (s): 0.38 | learning rate: 1.955E-04 | global batch size: 256 | lm loss: 3.365342E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 680.800 | TFLOPs: 31.78 | 7: iteration 12800/ 115203 | consumed samples: 3276800 | consumed tokens: 6710886400 | elapsed time per iteration (s): 0.37 | learning rate: 1.954E-04 | global batch size: 256 | lm loss: 3.364917E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 683.196 | TFLOPs: 31.89 | 7: iteration 12900/ 115203 | consumed samples: 3302400 | consumed tokens: 6763315200 | elapsed time per iteration (s): 0.37 | learning rate: 1.953E-04 | global batch size: 256 | lm loss: 3.361602E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.902 | TFLOPs: 31.88 | 7: iteration 13000/ 115203 | consumed samples: 3328000 | consumed tokens: 6815744000 | elapsed time per iteration (s): 0.38 | learning rate: 1.952E-04 | global batch size: 256 | lm loss: 3.361462E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.411 | TFLOPs: 31.81 | 7: iteration 13100/ 115203 | consumed samples: 3353600 | consumed tokens: 6868172800 | elapsed time per iteration (s): 0.38 | learning rate: 1.952E-04 | global batch size: 256 | lm loss: 3.362159E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.907 | TFLOPs: 31.83 | 7: iteration 13200/ 115203 | consumed samples: 3379200 | consumed tokens: 6920601600 | elapsed time per iteration (s): 0.37 | learning rate: 1.951E-04 | global batch size: 256 | lm loss: 3.357452E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 683.094 | TFLOPs: 31.88 | 7: iteration 13300/ 115203 | consumed samples: 3404800 | consumed tokens: 6973030400 | elapsed time per iteration (s): 0.37 | learning rate: 1.950E-04 | global batch size: 256 | lm loss: 3.351623E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 683.563 | TFLOPs: 31.91 | 7: iteration 13400/ 115203 | consumed samples: 3430400 | consumed tokens: 7025459200 | elapsed time per iteration (s): 0.37 | learning rate: 1.949E-04 | global batch size: 256 | lm loss: 3.349235E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 683.567 | TFLOPs: 31.91 | 7: iteration 13500/ 115203 | consumed samples: 3456000 | consumed tokens: 7077888000 | elapsed time per iteration (s): 0.37 | learning rate: 1.948E-04 | global batch size: 256 | lm loss: 3.344056E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 683.139 | TFLOPs: 31.89 | 7: iteration 13600/ 115203 | consumed samples: 3481600 | consumed tokens: 7130316800 | elapsed time per iteration (s): 0.37 | learning rate: 1.948E-04 | global batch size: 256 | lm loss: 3.345297E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 683.600 | TFLOPs: 31.91 | 7: iteration 13700/ 115203 | consumed samples: 3507200 | consumed tokens: 7182745600 | elapsed time per iteration (s): 0.37 | learning rate: 1.947E-04 | global batch size: 256 | lm loss: 3.339191E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 683.953 | TFLOPs: 31.92 | 7: iteration 13800/ 115203 | consumed samples: 3532800 | consumed tokens: 7235174400 | elapsed time per iteration (s): 0.37 | learning rate: 1.946E-04 | global batch size: 256 | lm loss: 3.338412E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 683.185 | TFLOPs: 31.89 | 7: iteration 13900/ 115203 | consumed samples: 3558400 | consumed tokens: 7287603200 | elapsed time per iteration (s): 0.37 | learning rate: 1.945E-04 | global batch size: 256 | lm loss: 3.337044E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 683.521 | TFLOPs: 31.90 | 0: [2023-03-16 20:23:48,948] [INFO] [logging.py:68:log_dist] [Rank 0] step=14000, skipped=0, lr=[0.00019442251142812213, 0.00019442251142812213, 0.00019442251142812213], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 14000/ 115203 | consumed samples: 3584000 | consumed tokens: 7340032000 | elapsed time per iteration (s): 0.38 | learning rate: 1.944E-04 | global batch size: 256 | lm loss: 3.333843E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.757 | TFLOPs: 31.82 | 0: steps: 14000 loss: 3.3374 iter time (s): 0.374 samples/sec: 684.236 7: iteration 14100/ 115203 | consumed samples: 3609600 | consumed tokens: 7392460800 | elapsed time per iteration (s): 0.38 | learning rate: 1.943E-04 | global batch size: 256 | lm loss: 3.336433E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.644 | TFLOPs: 31.86 | 7: iteration 14200/ 115203 | consumed samples: 3635200 | consumed tokens: 7444889600 | elapsed time per iteration (s): 0.37 | learning rate: 1.942E-04 | global batch size: 256 | lm loss: 3.331024E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.848 | TFLOPs: 31.87 | 7: iteration 14300/ 115203 | consumed samples: 3660800 | consumed tokens: 7497318400 | elapsed time per iteration (s): 0.38 | learning rate: 1.942E-04 | global batch size: 256 | lm loss: 3.331892E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 680.563 | TFLOPs: 31.77 | 7: iteration 14400/ 115203 | consumed samples: 3686400 | consumed tokens: 7549747200 | elapsed time per iteration (s): 0.38 | learning rate: 1.941E-04 | global batch size: 256 | lm loss: 3.323738E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 679.601 | TFLOPs: 31.72 | 7: iteration 14500/ 115203 | consumed samples: 3712000 | consumed tokens: 7602176000 | elapsed time per iteration (s): 0.38 | learning rate: 1.940E-04 | global batch size: 256 | lm loss: 3.325842E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.802 | TFLOPs: 31.82 | 7: iteration 14600/ 115203 | consumed samples: 3737600 | consumed tokens: 7654604800 | elapsed time per iteration (s): 0.38 | learning rate: 1.939E-04 | global batch size: 256 | lm loss: 3.339405E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.328 | TFLOPs: 31.85 | 7: iteration 14700/ 115203 | consumed samples: 3763200 | consumed tokens: 7707033600 | elapsed time per iteration (s): 0.38 | learning rate: 1.938E-04 | global batch size: 256 | lm loss: 3.316498E+00 | grad norm: 0.264 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.699 | TFLOPs: 31.82 | 7: iteration 14800/ 115203 | consumed samples: 3788800 | consumed tokens: 7759462400 | elapsed time per iteration (s): 0.38 | learning rate: 1.937E-04 | global batch size: 256 | lm loss: 3.317620E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.414 | TFLOPs: 31.85 | 7: iteration 14900/ 115203 | consumed samples: 3814400 | consumed tokens: 7811891200 | elapsed time per iteration (s): 0.38 | learning rate: 1.936E-04 | global batch size: 256 | lm loss: 3.316042E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.328 | TFLOPs: 31.80 | 7: iteration 15000/ 115203 | consumed samples: 3840000 | consumed tokens: 7864320000 | elapsed time per iteration (s): 0.38 | learning rate: 1.935E-04 | global batch size: 256 | lm loss: 3.312700E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.945 | TFLOPs: 31.83 | 7: iteration 15100/ 115203 | consumed samples: 3865600 | consumed tokens: 7916748800 | elapsed time per iteration (s): 0.38 | learning rate: 1.934E-04 | global batch size: 256 | lm loss: 3.314106E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.355 | TFLOPs: 31.80 | 7: iteration 15200/ 115203 | consumed samples: 3891200 | consumed tokens: 7969177600 | elapsed time per iteration (s): 0.38 | learning rate: 1.933E-04 | global batch size: 256 | lm loss: 3.312559E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.078 | TFLOPs: 31.79 | 7: iteration 15300/ 115203 | consumed samples: 3916800 | consumed tokens: 8021606400 | elapsed time per iteration (s): 0.38 | learning rate: 1.933E-04 | global batch size: 256 | lm loss: 3.308775E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 680.119 | TFLOPs: 31.75 | 7: iteration 15400/ 115203 | consumed samples: 3942400 | consumed tokens: 8074035200 | elapsed time per iteration (s): 0.38 | learning rate: 1.932E-04 | global batch size: 256 | lm loss: 3.308352E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.195 | TFLOPs: 31.80 | 7: iteration 15500/ 115203 | consumed samples: 3968000 | consumed tokens: 8126464000 | elapsed time per iteration (s): 0.38 | learning rate: 1.931E-04 | global batch size: 256 | lm loss: 3.303297E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 680.840 | TFLOPs: 31.78 | 7: iteration 15600/ 115203 | consumed samples: 3993600 | consumed tokens: 8178892800 | elapsed time per iteration (s): 0.38 | learning rate: 1.930E-04 | global batch size: 256 | lm loss: 3.304788E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.353 | TFLOPs: 31.80 | 7: iteration 15700/ 115203 | consumed samples: 4019200 | consumed tokens: 8231321600 | elapsed time per iteration (s): 0.38 | learning rate: 1.929E-04 | global batch size: 256 | lm loss: 3.302641E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 679.430 | TFLOPs: 31.71 | 7: iteration 15800/ 115203 | consumed samples: 4044800 | consumed tokens: 8283750400 | elapsed time per iteration (s): 0.38 | learning rate: 1.928E-04 | global batch size: 256 | lm loss: 3.298766E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 680.641 | TFLOPs: 31.77 | 7: iteration 15900/ 115203 | consumed samples: 4070400 | consumed tokens: 8336179200 | elapsed time per iteration (s): 0.38 | learning rate: 1.927E-04 | global batch size: 256 | lm loss: 3.300143E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 679.402 | TFLOPs: 31.71 | 0: [2023-03-16 20:36:20,607] [INFO] [logging.py:68:log_dist] [Rank 0] step=16000, skipped=0, lr=[0.00019257700559212364, 0.00019257700559212364, 0.00019257700559212364], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 16000/ 115203 | consumed samples: 4096000 | consumed tokens: 8388608000 | elapsed time per iteration (s): 0.38 | learning rate: 1.926E-04 | global batch size: 256 | lm loss: 3.296188E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 680.654 | TFLOPs: 31.77 | 0: steps: 16000 loss: 3.3295 iter time (s): 0.374 samples/sec: 684.834 7: iteration 16100/ 115203 | consumed samples: 4121600 | consumed tokens: 8441036800 | elapsed time per iteration (s): 0.38 | learning rate: 1.925E-04 | global batch size: 256 | lm loss: 3.295869E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 678.878 | TFLOPs: 31.69 | 7: iteration 16200/ 115203 | consumed samples: 4147200 | consumed tokens: 8493465600 | elapsed time per iteration (s): 0.38 | learning rate: 1.924E-04 | global batch size: 256 | lm loss: 3.290582E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 680.098 | TFLOPs: 31.74 | 7: iteration 16300/ 115203 | consumed samples: 4172800 | consumed tokens: 8545894400 | elapsed time per iteration (s): 0.38 | learning rate: 1.923E-04 | global batch size: 256 | lm loss: 3.288927E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 680.626 | TFLOPs: 31.77 | 7: iteration 16400/ 115203 | consumed samples: 4198400 | consumed tokens: 8598323200 | elapsed time per iteration (s): 0.38 | learning rate: 1.922E-04 | global batch size: 256 | lm loss: 3.289553E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 680.901 | TFLOPs: 31.78 | 7: iteration 16500/ 115203 | consumed samples: 4224000 | consumed tokens: 8650752000 | elapsed time per iteration (s): 0.38 | learning rate: 1.921E-04 | global batch size: 256 | lm loss: 3.288514E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 680.766 | TFLOPs: 31.78 | 7: iteration 16600/ 115203 | consumed samples: 4249600 | consumed tokens: 8703180800 | elapsed time per iteration (s): 0.38 | learning rate: 1.920E-04 | global batch size: 256 | lm loss: 3.284726E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 680.987 | TFLOPs: 31.79 | 7: iteration 16700/ 115203 | consumed samples: 4275200 | consumed tokens: 8755609600 | elapsed time per iteration (s): 0.38 | learning rate: 1.919E-04 | global batch size: 256 | lm loss: 3.284472E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 680.528 | TFLOPs: 31.76 | 7: iteration 16800/ 115203 | consumed samples: 4300800 | consumed tokens: 8808038400 | elapsed time per iteration (s): 0.38 | learning rate: 1.918E-04 | global batch size: 256 | lm loss: 3.282026E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 680.804 | TFLOPs: 31.78 | 7: iteration 16900/ 115203 | consumed samples: 4326400 | consumed tokens: 8860467200 | elapsed time per iteration (s): 0.38 | learning rate: 1.917E-04 | global batch size: 256 | lm loss: 3.277841E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 679.509 | TFLOPs: 31.72 | 7: iteration 17000/ 115203 | consumed samples: 4352000 | consumed tokens: 8912896000 | elapsed time per iteration (s): 0.38 | learning rate: 1.916E-04 | global batch size: 256 | lm loss: 3.280028E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 680.599 | TFLOPs: 31.77 | 7: iteration 17100/ 115203 | consumed samples: 4377600 | consumed tokens: 8965324800 | elapsed time per iteration (s): 0.38 | learning rate: 1.915E-04 | global batch size: 256 | lm loss: 3.279636E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 680.522 | TFLOPs: 31.76 | 7: iteration 17200/ 115203 | consumed samples: 4403200 | consumed tokens: 9017753600 | elapsed time per iteration (s): 0.38 | learning rate: 1.913E-04 | global batch size: 256 | lm loss: 3.274473E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 680.204 | TFLOPs: 31.75 | 7: iteration 17300/ 115203 | consumed samples: 4428800 | consumed tokens: 9070182400 | elapsed time per iteration (s): 0.38 | learning rate: 1.912E-04 | global batch size: 256 | lm loss: 3.272494E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 679.713 | TFLOPs: 31.73 | 7: iteration 17400/ 115203 | consumed samples: 4454400 | consumed tokens: 9122611200 | elapsed time per iteration (s): 0.38 | learning rate: 1.911E-04 | global batch size: 256 | lm loss: 3.273976E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 680.318 | TFLOPs: 31.75 | 7: iteration 17500/ 115203 | consumed samples: 4480000 | consumed tokens: 9175040000 | elapsed time per iteration (s): 0.38 | learning rate: 1.910E-04 | global batch size: 256 | lm loss: 3.268686E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 680.101 | TFLOPs: 31.74 | 7: iteration 17600/ 115203 | consumed samples: 4505600 | consumed tokens: 9227468800 | elapsed time per iteration (s): 0.38 | learning rate: 1.909E-04 | global batch size: 256 | lm loss: 3.267637E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 680.391 | TFLOPs: 31.76 | 7: iteration 17700/ 115203 | consumed samples: 4531200 | consumed tokens: 9279897600 | elapsed time per iteration (s): 0.38 | learning rate: 1.908E-04 | global batch size: 256 | lm loss: 3.269246E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 680.520 | TFLOPs: 31.76 | 7: iteration 17800/ 115203 | consumed samples: 4556800 | consumed tokens: 9332326400 | elapsed time per iteration (s): 0.38 | learning rate: 1.907E-04 | global batch size: 256 | lm loss: 3.268909E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 679.944 | TFLOPs: 31.74 | 7: iteration 17900/ 115203 | consumed samples: 4582400 | consumed tokens: 9384755200 | elapsed time per iteration (s): 0.38 | learning rate: 1.906E-04 | global batch size: 256 | lm loss: 3.267029E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 680.622 | TFLOPs: 31.77 | 0: [2023-03-16 20:48:53,136] [INFO] [logging.py:68:log_dist] [Rank 0] step=18000, skipped=0, lr=[0.00019048094388569267, 0.00019048094388569267, 0.00019048094388569267], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 18000/ 115203 | consumed samples: 4608000 | consumed tokens: 9437184000 | elapsed time per iteration (s): 0.38 | learning rate: 1.905E-04 | global batch size: 256 | lm loss: 3.262947E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.433 | TFLOPs: 31.81 | 0: steps: 18000 loss: 3.2855 iter time (s): 0.374 samples/sec: 684.094 7: iteration 18100/ 115203 | consumed samples: 4633600 | consumed tokens: 9489612800 | elapsed time per iteration (s): 0.38 | learning rate: 1.904E-04 | global batch size: 256 | lm loss: 3.259184E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 680.297 | TFLOPs: 31.75 | 7: iteration 18200/ 115203 | consumed samples: 4659200 | consumed tokens: 9542041600 | elapsed time per iteration (s): 0.38 | learning rate: 1.903E-04 | global batch size: 256 | lm loss: 3.258158E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.416 | TFLOPs: 31.81 | 7: iteration 18300/ 115203 | consumed samples: 4684800 | consumed tokens: 9594470400 | elapsed time per iteration (s): 0.38 | learning rate: 1.901E-04 | global batch size: 256 | lm loss: 3.262242E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.754 | TFLOPs: 31.82 | 7: iteration 18400/ 115203 | consumed samples: 4710400 | consumed tokens: 9646899200 | elapsed time per iteration (s): 0.38 | learning rate: 1.900E-04 | global batch size: 256 | lm loss: 3.260729E+00 | grad norm: 0.270 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.445 | TFLOPs: 31.81 | 7: iteration 18500/ 115203 | consumed samples: 4736000 | consumed tokens: 9699328000 | elapsed time per iteration (s): 0.38 | learning rate: 1.899E-04 | global batch size: 256 | lm loss: 3.251843E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.483 | TFLOPs: 31.81 | 7: iteration 18600/ 115203 | consumed samples: 4761600 | consumed tokens: 9751756800 | elapsed time per iteration (s): 0.38 | learning rate: 1.898E-04 | global batch size: 256 | lm loss: 3.251968E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.486 | TFLOPs: 31.81 | 7: iteration 18700/ 115203 | consumed samples: 4787200 | consumed tokens: 9804185600 | elapsed time per iteration (s): 0.38 | learning rate: 1.897E-04 | global batch size: 256 | lm loss: 3.252914E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.466 | TFLOPs: 31.81 | 7: iteration 18800/ 115203 | consumed samples: 4812800 | consumed tokens: 9856614400 | elapsed time per iteration (s): 0.38 | learning rate: 1.896E-04 | global batch size: 256 | lm loss: 3.256257E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 680.985 | TFLOPs: 31.79 | 7: iteration 18900/ 115203 | consumed samples: 4838400 | consumed tokens: 9909043200 | elapsed time per iteration (s): 0.38 | learning rate: 1.895E-04 | global batch size: 256 | lm loss: 3.250718E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.253 | TFLOPs: 31.80 | 7: iteration 19000/ 115203 | consumed samples: 4864000 | consumed tokens: 9961472000 | elapsed time per iteration (s): 0.38 | learning rate: 1.893E-04 | global batch size: 256 | lm loss: 3.248806E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.191 | TFLOPs: 31.80 | 7: iteration 19100/ 115203 | consumed samples: 4889600 | consumed tokens: 10013900800 | elapsed time per iteration (s): 0.38 | learning rate: 1.892E-04 | global batch size: 256 | lm loss: 3.249985E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.054 | TFLOPs: 31.79 | 7: iteration 19200/ 115203 | consumed samples: 4915200 | consumed tokens: 10066329600 | elapsed time per iteration (s): 0.38 | learning rate: 1.891E-04 | global batch size: 256 | lm loss: 3.249693E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.622 | TFLOPs: 31.82 | 7: iteration 19300/ 115203 | consumed samples: 4940800 | consumed tokens: 10118758400 | elapsed time per iteration (s): 0.38 | learning rate: 1.890E-04 | global batch size: 256 | lm loss: 3.242921E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.544 | TFLOPs: 31.81 | 7: iteration 19400/ 115203 | consumed samples: 4966400 | consumed tokens: 10171187200 | elapsed time per iteration (s): 0.38 | learning rate: 1.889E-04 | global batch size: 256 | lm loss: 3.244362E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 680.720 | TFLOPs: 31.77 | 7: iteration 19500/ 115203 | consumed samples: 4992000 | consumed tokens: 10223616000 | elapsed time per iteration (s): 0.38 | learning rate: 1.887E-04 | global batch size: 256 | lm loss: 3.246279E+00 | grad norm: 0.271 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 680.522 | TFLOPs: 31.76 | 7: iteration 19600/ 115203 | consumed samples: 5017600 | consumed tokens: 10276044800 | elapsed time per iteration (s): 0.38 | learning rate: 1.886E-04 | global batch size: 256 | lm loss: 3.243002E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.157 | TFLOPs: 31.79 | 7: iteration 19700/ 115203 | consumed samples: 5043200 | consumed tokens: 10328473600 | elapsed time per iteration (s): 0.38 | learning rate: 1.885E-04 | global batch size: 256 | lm loss: 3.241749E+00 | grad norm: 0.271 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.543 | TFLOPs: 31.81 | 7: iteration 19800/ 115203 | consumed samples: 5068800 | consumed tokens: 10380902400 | elapsed time per iteration (s): 0.38 | learning rate: 1.884E-04 | global batch size: 256 | lm loss: 3.238343E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.251 | TFLOPs: 31.80 | 7: iteration 19900/ 115203 | consumed samples: 5094400 | consumed tokens: 10433331200 | elapsed time per iteration (s): 0.38 | learning rate: 1.883E-04 | global batch size: 256 | lm loss: 3.234411E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.312 | TFLOPs: 31.80 | 0: [2023-03-16 21:01:24,707] [INFO] [logging.py:68:log_dist] [Rank 0] step=20000, skipped=0, lr=[0.00018814068619753637, 0.00018814068619753637, 0.00018814068619753637], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 20000/ 115203 | consumed samples: 5120000 | consumed tokens: 10485760000 | elapsed time per iteration (s): 0.38 | learning rate: 1.881E-04 | global batch size: 256 | lm loss: 3.236018E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.290 | TFLOPs: 31.80 | 0: steps: 20000 loss: 3.2428 iter time (s): 0.374 samples/sec: 684.770 7: ------------------------------------------------------------------------------------------------ 7: validation loss at iteration 20000 | lm loss value: 3.752458E+00 | lm loss PPL: 4.262572E+01 | 7: ------------------------------------------------------------------------------------------------ 0: saving checkpoint at iteration 20000 to checkpoints_146m60b100m 0: [2023-03-16 21:01:24,836] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step20000 is begin to save! 0: [2023-03-16 21:01:24,847] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step20000/layer_01-model_00-model_states.pt... 0: [2023-03-16 21:01:24,947] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step20000/layer_01-model_00-model_states.pt. 0: [2023-03-16 21:01:24,948] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step20000/layer_03-model_00-model_states.pt... 0: [2023-03-16 21:01:24,963] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step20000/layer_03-model_00-model_states.pt. 0: [2023-03-16 21:01:24,964] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step20000/layer_04-model_00-model_states.pt... 0: [2023-03-16 21:01:24,979] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step20000/layer_04-model_00-model_states.pt. 0: [2023-03-16 21:01:24,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step20000/layer_05-model_00-model_states.pt... 0: [2023-03-16 21:01:24,994] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step20000/layer_05-model_00-model_states.pt. 0: [2023-03-16 21:01:24,994] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step20000/layer_06-model_00-model_states.pt... 0: [2023-03-16 21:01:25,009] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step20000/layer_06-model_00-model_states.pt. 0: [2023-03-16 21:01:25,009] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step20000/layer_07-model_00-model_states.pt... 0: [2023-03-16 21:01:25,024] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step20000/layer_07-model_00-model_states.pt. 0: [2023-03-16 21:01:25,024] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step20000/layer_08-model_00-model_states.pt... 0: [2023-03-16 21:01:25,039] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step20000/layer_08-model_00-model_states.pt. 0: [2023-03-16 21:01:25,040] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step20000/layer_09-model_00-model_states.pt... 0: [2023-03-16 21:01:25,055] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step20000/layer_09-model_00-model_states.pt. 0: [2023-03-16 21:01:25,055] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step20000/layer_10-model_00-model_states.pt... 0: [2023-03-16 21:01:25,070] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step20000/layer_10-model_00-model_states.pt. 0: [2023-03-16 21:01:25,070] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step20000/layer_11-model_00-model_states.pt... 0: [2023-03-16 21:01:25,085] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step20000/layer_11-model_00-model_states.pt. 0: [2023-03-16 21:01:25,085] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step20000/layer_12-model_00-model_states.pt... 0: [2023-03-16 21:01:25,100] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step20000/layer_12-model_00-model_states.pt. 0: [2023-03-16 21:01:25,100] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step20000/layer_13-model_00-model_states.pt... 0: [2023-03-16 21:01:25,115] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step20000/layer_13-model_00-model_states.pt. 0: [2023-03-16 21:01:25,116] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step20000/layer_14-model_00-model_states.pt... 0: [2023-03-16 21:01:25,131] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step20000/layer_14-model_00-model_states.pt. 0: [2023-03-16 21:01:25,131] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step20000/layer_15-model_00-model_states.pt... 0: [2023-03-16 21:01:25,146] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step20000/layer_15-model_00-model_states.pt. 0: [2023-03-16 21:01:25,146] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step20000/layer_16-model_00-model_states.pt... 0: [2023-03-16 21:01:25,161] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step20000/layer_16-model_00-model_states.pt. 0: [2023-03-16 21:01:25,161] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step20000/layer_17-model_00-model_states.pt... 0: [2023-03-16 21:01:25,176] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step20000/layer_17-model_00-model_states.pt. 0: [2023-03-16 21:01:25,176] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step20000/layer_19-model_00-model_states.pt... 0: [2023-03-16 21:01:25,177] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step20000/layer_19-model_00-model_states.pt. 0: [2023-03-16 21:01:25,178] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_146m60b100m/global_step20000/mp_rank_00_model_states.pt 0: [2023-03-16 21:01:25,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step20000/mp_rank_00_model_states.pt... 0: [2023-03-16 21:01:25,180] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step20000/mp_rank_00_model_states.pt. 0: [2023-03-16 21:01:25,198] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-16 21:01:25,198] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2023-03-16 21:01:25,198] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-16 21:01:25,198] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 2: [2023-03-16 21:01:25,198] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-16 21:01:25,198] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2023-03-16 21:01:25,198] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-16 21:01:25,198] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-16 21:01:25,198] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 6: [2023-03-16 21:01:25,198] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-16 21:01:25,198] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-16 21:01:25,198] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-16 21:01:25,198] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-16 21:01:25,198] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 3: [2023-03-16 21:01:25,198] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-16 21:01:25,198] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-16 21:01:25,198] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-16 21:01:25,198] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-16 21:01:25,198] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 4: [2023-03-16 21:01:25,198] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-16 21:01:25,198] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-16 21:01:25,198] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-16 21:01:25,198] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-16 21:01:25,198] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 5: [2023-03-16 21:01:25,198] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-16 21:01:25,198] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-16 21:01:25,198] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-03-16 21:01:25,198] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-16 21:01:25,198] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 7: [2023-03-16 21:01:25,198] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-16 21:01:25,198] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-16 21:01:25,198] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 7: [2023-03-16 21:01:25,198] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-16 21:01:25,198] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 1: [2023-03-16 21:01:25,198] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-03-16 21:01:25,198] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-16 21:01:25,198] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-16 21:01:25,198] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-16 21:01:25,198] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 0: [2023-03-16 21:01:25,198] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2023-03-16 21:01:25,198] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 2: [2023-03-16 21:01:25,198] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-16 21:01:25,198] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 6: [2023-03-16 21:01:25,198] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-16 21:01:25,198] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-16 21:01:25,198] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 3: [2023-03-16 21:01:25,198] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 4: [2023-03-16 21:01:25,198] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-16 21:01:25,198] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 5: [2023-03-16 21:01:25,198] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-16 21:01:25,198] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-16 21:01:25,198] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 7: [2023-03-16 21:01:25,198] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-16 21:01:25,198] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 1: [2023-03-16 21:01:25,198] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-16 21:01:25,198] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2023-03-16 21:01:25,198] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 0: [2023-03-16 21:01:25,198] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 2: [2023-03-16 21:01:25,198] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 3: [2023-03-16 21:01:25,198] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-16 21:01:25,198] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 4: [2023-03-16 21:01:25,198] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 7: [2023-03-16 21:01:25,198] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 0: [2023-03-16 21:01:25,198] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2023-03-16 21:01:25,233] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 0: [2023-03-16 21:01:25,234] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-16 21:01:25,234] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-03-16 21:01:25,234] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-03-16 21:01:25,234] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-03-16 21:01:25,234] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 0: [2023-03-16 21:01:25,234] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 0: [2023-03-16 21:01:25,234] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2023-03-16 21:01:25,234] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-03-16 21:01:25,234] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 0: [2023-03-16 21:01:25,234] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-03-16 21:01:25,235] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-03-16 21:01:25,235] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 0: [2023-03-16 21:01:25,242] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-03-16 21:01:25,242] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-16 21:01:25,242] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 0: [2023-03-16 21:01:25,244] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-03-16 21:01:25,244] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2023-03-16 21:01:25,244] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-16 21:01:25,244] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-16 21:01:25,244] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 0: [2023-03-16 21:01:25,244] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 0: [2023-03-16 21:01:25,264] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-16 21:01:25,264] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 6: [2023-03-16 21:01:25,267] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-03-16 21:01:25,267] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2023-03-16 21:01:25,267] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2023-03-16 21:01:25,267] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2023-03-16 21:01:25,267] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-03-16 21:01:25,267] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-03-16 21:01:25,268] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-16 21:01:25,267] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2023-03-16 21:01:25,268] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2023-03-16 21:01:25,268] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2023-03-16 21:01:25,268] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2023-03-16 21:01:25,268] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-03-16 21:01:25,268] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-16 21:01:25,268] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 6: [2023-03-16 21:01:25,268] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 6: [2023-03-16 21:01:25,268] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 6: [2023-03-16 21:01:25,268] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 6: [2023-03-16 21:01:25,268] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-16 21:01:25,268] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 6: [2023-03-16 21:01:25,268] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 6: [2023-03-16 21:01:25,267] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-03-16 21:01:25,268] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 6: [2023-03-16 21:01:25,268] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-16 21:01:25,268] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 1: [2023-03-16 21:01:25,273] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-03-16 21:01:25,273] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2023-03-16 21:01:25,273] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-16 21:01:25,273] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-03-16 21:01:25,273] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-03-16 21:01:25,273] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2023-03-16 21:01:25,273] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-16 21:01:25,273] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-16 21:01:25,273] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 1: [2023-03-16 21:01:25,273] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 1: [2023-03-16 21:01:25,273] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 1: [2023-03-16 21:01:25,273] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 4: [2023-03-16 21:01:25,273] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-03-16 21:01:25,273] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-03-16 21:01:25,273] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2023-03-16 21:01:25,274] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-16 21:01:25,274] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-16 21:01:25,273] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2023-03-16 21:01:25,273] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-03-16 21:01:25,273] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2023-03-16 21:01:25,273] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-03-16 21:01:25,273] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-03-16 21:01:25,274] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2023-03-16 21:01:25,274] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 5: [2023-03-16 21:01:25,273] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-03-16 21:01:25,273] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2023-03-16 21:01:25,273] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-03-16 21:01:25,274] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2023-03-16 21:01:25,274] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 4: [2023-03-16 21:01:25,274] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 4: [2023-03-16 21:01:25,274] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 5: [2023-03-16 21:01:25,274] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-03-16 21:01:25,274] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2023-03-16 21:01:25,274] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-03-16 21:01:25,274] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-03-16 21:01:25,274] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-03-16 21:01:25,274] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-16 21:01:25,274] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-03-16 21:01:25,274] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-03-16 21:01:25,274] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 4: [2023-03-16 21:01:25,274] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-03-16 21:01:25,274] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-16 21:01:25,274] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-03-16 21:01:25,274] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2023-03-16 21:01:25,274] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 5: [2023-03-16 21:01:25,274] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2023-03-16 21:01:25,274] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-03-16 21:01:25,274] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 5: [2023-03-16 21:01:25,274] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 4: [2023-03-16 21:01:25,274] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 4: [2023-03-16 21:01:25,274] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 4: [2023-03-16 21:01:25,274] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 4: [2023-03-16 21:01:25,274] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 4: [2023-03-16 21:01:25,274] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 5: [2023-03-16 21:01:25,274] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 5: [2023-03-16 21:01:25,274] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 5: [2023-03-16 21:01:25,274] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 5: [2023-03-16 21:01:25,274] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 5: [2023-03-16 21:01:25,274] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 5: [2023-03-16 21:01:25,274] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 1: [2023-03-16 21:01:25,275] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-03-16 21:01:25,275] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2023-03-16 21:01:25,275] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-03-16 21:01:25,275] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-03-16 21:01:25,275] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-03-16 21:01:25,275] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2023-03-16 21:01:25,275] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-03-16 21:01:25,275] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-16 21:01:25,275] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 1: [2023-03-16 21:01:25,275] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 1: [2023-03-16 21:01:25,275] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 1: [2023-03-16 21:01:25,275] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 3: [2023-03-16 21:01:25,276] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-03-16 21:01:25,276] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2023-03-16 21:01:25,276] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2023-03-16 21:01:25,276] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2023-03-16 21:01:25,276] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-16 21:01:25,276] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-03-16 21:01:25,276] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-03-16 21:01:25,276] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2023-03-16 21:01:25,277] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-16 21:01:25,277] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2023-03-16 21:01:25,277] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 3: [2023-03-16 21:01:25,277] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 3: [2023-03-16 21:01:25,277] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2023-03-16 21:01:25,277] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-03-16 21:01:25,277] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-16 21:01:25,277] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-16 21:01:25,277] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-03-16 21:01:25,277] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-16 21:01:25,277] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 3: [2023-03-16 21:01:25,277] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 3: [2023-03-16 21:01:25,277] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 3: [2023-03-16 21:01:25,277] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 3: [2023-03-16 21:01:25,277] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 3: [2023-03-16 21:01:25,277] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 2: [2023-03-16 21:01:25,278] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-16 21:01:25,278] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2023-03-16 21:01:25,278] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2023-03-16 21:01:25,278] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-03-16 21:01:25,278] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2023-03-16 21:01:25,278] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2023-03-16 21:01:25,278] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-03-16 21:01:25,278] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-16 21:01:25,278] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-03-16 21:01:25,278] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-16 21:01:25,278] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-03-16 21:01:25,278] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-03-16 21:01:25,278] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 2: [2023-03-16 21:01:25,278] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 2: [2023-03-16 21:01:25,278] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2023-03-16 21:01:25,278] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 2: [2023-03-16 21:01:25,278] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2023-03-16 21:01:25,278] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-16 21:01:25,278] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 2: [2023-03-16 21:01:25,278] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-03-16 21:01:25,278] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 2: [2023-03-16 21:01:25,278] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 2: [2023-03-16 21:01:25,278] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 2: [2023-03-16 21:01:25,278] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 7: [2023-03-16 21:01:25,281] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-16 21:01:25,281] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2023-03-16 21:01:25,281] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-03-16 21:01:25,281] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-03-16 21:01:25,281] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-03-16 21:01:25,281] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2023-03-16 21:01:25,281] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2023-03-16 21:01:25,281] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2023-03-16 21:01:25,281] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-03-16 21:01:25,281] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-16 21:01:25,281] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-16 21:01:25,281] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-03-16 21:01:25,281] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-03-16 21:01:25,281] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-03-16 21:01:25,281] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-03-16 21:01:25,281] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step20000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2023-03-16 21:01:25,281] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 7: [2023-03-16 21:01:25,281] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 7: [2023-03-16 21:01:25,282] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 7: [2023-03-16 21:01:25,282] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 7: [2023-03-16 21:01:25,282] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 7: [2023-03-16 21:01:25,282] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 7: [2023-03-16 21:01:25,282] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 7: [2023-03-16 21:01:25,282] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 0: successfully saved checkpoint at iteration 20000 to checkpoints_146m60b100m 7: time (ms) | save-checkpoint: 457.47 7: iteration 20100/ 115203 | consumed samples: 5145600 | consumed tokens: 10538188800 | elapsed time per iteration (s): 0.38 | learning rate: 1.880E-04 | global batch size: 256 | lm loss: 3.235599E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 671.706 | TFLOPs: 31.35 | 7: iteration 20200/ 115203 | consumed samples: 5171200 | consumed tokens: 10590617600 | elapsed time per iteration (s): 0.38 | learning rate: 1.879E-04 | global batch size: 256 | lm loss: 3.237645E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.806 | TFLOPs: 31.82 | 7: iteration 20300/ 115203 | consumed samples: 5196800 | consumed tokens: 10643046400 | elapsed time per iteration (s): 0.38 | learning rate: 1.878E-04 | global batch size: 256 | lm loss: 3.230366E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.183 | TFLOPs: 31.80 | 7: iteration 20400/ 115203 | consumed samples: 5222400 | consumed tokens: 10695475200 | elapsed time per iteration (s): 0.38 | learning rate: 1.876E-04 | global batch size: 256 | lm loss: 3.231848E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 667.136 | TFLOPs: 31.14 | 7: iteration 20500/ 115203 | consumed samples: 5248000 | consumed tokens: 10747904000 | elapsed time per iteration (s): 0.39 | learning rate: 1.875E-04 | global batch size: 256 | lm loss: 3.230318E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 657.738 | TFLOPs: 30.70 | 7: iteration 20600/ 115203 | consumed samples: 5273600 | consumed tokens: 10800332800 | elapsed time per iteration (s): 0.39 | learning rate: 1.874E-04 | global batch size: 256 | lm loss: 3.226898E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 658.195 | TFLOPs: 30.72 | 7: iteration 20700/ 115203 | consumed samples: 5299200 | consumed tokens: 10852761600 | elapsed time per iteration (s): 0.39 | learning rate: 1.873E-04 | global batch size: 256 | lm loss: 3.227829E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 656.158 | TFLOPs: 30.63 | 7: iteration 20800/ 115203 | consumed samples: 5324800 | consumed tokens: 10905190400 | elapsed time per iteration (s): 0.39 | learning rate: 1.871E-04 | global batch size: 256 | lm loss: 3.222620E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 661.311 | TFLOPs: 30.87 | 7: iteration 20900/ 115203 | consumed samples: 5350400 | consumed tokens: 10957619200 | elapsed time per iteration (s): 0.39 | learning rate: 1.870E-04 | global batch size: 256 | lm loss: 3.224350E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 655.895 | TFLOPs: 30.61 | 7: iteration 21000/ 115203 | consumed samples: 5376000 | consumed tokens: 11010048000 | elapsed time per iteration (s): 0.39 | learning rate: 1.869E-04 | global batch size: 256 | lm loss: 3.222415E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 663.945 | TFLOPs: 30.99 | 7: iteration 21100/ 115203 | consumed samples: 5401600 | consumed tokens: 11062476800 | elapsed time per iteration (s): 0.39 | learning rate: 1.868E-04 | global batch size: 256 | lm loss: 3.226923E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 664.283 | TFLOPs: 31.01 | 7: iteration 21200/ 115203 | consumed samples: 5427200 | consumed tokens: 11114905600 | elapsed time per iteration (s): 0.39 | learning rate: 1.866E-04 | global batch size: 256 | lm loss: 3.223450E+00 | grad norm: 0.265 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 658.011 | TFLOPs: 30.71 | 7: iteration 21300/ 115203 | consumed samples: 5452800 | consumed tokens: 11167334400 | elapsed time per iteration (s): 0.39 | learning rate: 1.865E-04 | global batch size: 256 | lm loss: 3.218152E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 663.871 | TFLOPs: 30.99 | 7: iteration 21400/ 115203 | consumed samples: 5478400 | consumed tokens: 11219763200 | elapsed time per iteration (s): 0.39 | learning rate: 1.864E-04 | global batch size: 256 | lm loss: 3.220018E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 661.156 | TFLOPs: 30.86 | 7: iteration 21500/ 115203 | consumed samples: 5504000 | consumed tokens: 11272192000 | elapsed time per iteration (s): 0.38 | learning rate: 1.862E-04 | global batch size: 256 | lm loss: 3.217674E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 668.242 | TFLOPs: 31.19 | 7: iteration 21600/ 115203 | consumed samples: 5529600 | consumed tokens: 11324620800 | elapsed time per iteration (s): 0.39 | learning rate: 1.861E-04 | global batch size: 256 | lm loss: 3.219724E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 655.048 | TFLOPs: 30.58 | 7: iteration 21700/ 115203 | consumed samples: 5555200 | consumed tokens: 11377049600 | elapsed time per iteration (s): 0.39 | learning rate: 1.860E-04 | global batch size: 256 | lm loss: 3.212410E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 664.569 | TFLOPs: 31.02 | 7: iteration 21800/ 115203 | consumed samples: 5580800 | consumed tokens: 11429478400 | elapsed time per iteration (s): 0.39 | learning rate: 1.858E-04 | global batch size: 256 | lm loss: 3.212636E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 659.244 | TFLOPs: 30.77 | 7: iteration 21900/ 115203 | consumed samples: 5606400 | consumed tokens: 11481907200 | elapsed time per iteration (s): 0.39 | learning rate: 1.857E-04 | global batch size: 256 | lm loss: 3.214604E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 651.629 | TFLOPs: 30.42 | 0: [2023-03-16 21:14:17,416] [INFO] [logging.py:68:log_dist] [Rank 0] step=22000, skipped=0, lr=[0.00018556333335793902, 0.00018556333335793902, 0.00018556333335793902], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 22000/ 115203 | consumed samples: 5632000 | consumed tokens: 11534336000 | elapsed time per iteration (s): 0.39 | learning rate: 1.856E-04 | global batch size: 256 | lm loss: 3.212226E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 652.920 | TFLOPs: 30.48 | 0: steps: 22000 loss: 3.2126 iter time (s): 0.384 samples/sec: 666.504 7: iteration 22100/ 115203 | consumed samples: 5657600 | consumed tokens: 11586764800 | elapsed time per iteration (s): 0.39 | learning rate: 1.854E-04 | global batch size: 256 | lm loss: 3.215702E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 649.433 | TFLOPs: 30.31 | 7: iteration 22200/ 115203 | consumed samples: 5683200 | consumed tokens: 11639193600 | elapsed time per iteration (s): 0.38 | learning rate: 1.853E-04 | global batch size: 256 | lm loss: 3.210232E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 668.076 | TFLOPs: 31.18 | 7: iteration 22300/ 115203 | consumed samples: 5708800 | consumed tokens: 11691622400 | elapsed time per iteration (s): 0.39 | learning rate: 1.852E-04 | global batch size: 256 | lm loss: 3.210496E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 658.814 | TFLOPs: 30.75 | 7: iteration 22400/ 115203 | consumed samples: 5734400 | consumed tokens: 11744051200 | elapsed time per iteration (s): 0.39 | learning rate: 1.850E-04 | global batch size: 256 | lm loss: 3.208841E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 658.862 | TFLOPs: 30.75 | 7: iteration 22500/ 115203 | consumed samples: 5760000 | consumed tokens: 11796480000 | elapsed time per iteration (s): 0.38 | learning rate: 1.849E-04 | global batch size: 256 | lm loss: 3.206376E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 666.684 | TFLOPs: 31.12 | 7: iteration 22600/ 115203 | consumed samples: 5785600 | consumed tokens: 11848908800 | elapsed time per iteration (s): 0.39 | learning rate: 1.847E-04 | global batch size: 256 | lm loss: 3.211146E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 660.059 | TFLOPs: 30.81 | 7: iteration 22700/ 115203 | consumed samples: 5811200 | consumed tokens: 11901337600 | elapsed time per iteration (s): 0.39 | learning rate: 1.846E-04 | global batch size: 256 | lm loss: 3.206127E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 657.756 | TFLOPs: 30.70 | 7: iteration 22800/ 115203 | consumed samples: 5836800 | consumed tokens: 11953766400 | elapsed time per iteration (s): 0.39 | learning rate: 1.845E-04 | global batch size: 256 | lm loss: 3.205816E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 663.110 | TFLOPs: 30.95 | 7: iteration 22900/ 115203 | consumed samples: 5862400 | consumed tokens: 12006195200 | elapsed time per iteration (s): 0.38 | learning rate: 1.843E-04 | global batch size: 256 | lm loss: 3.202445E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 666.023 | TFLOPs: 31.09 | 7: iteration 23000/ 115203 | consumed samples: 5888000 | consumed tokens: 12058624000 | elapsed time per iteration (s): 0.38 | learning rate: 1.842E-04 | global batch size: 256 | lm loss: 3.204269E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 668.007 | TFLOPs: 31.18 | 7: iteration 23100/ 115203 | consumed samples: 5913600 | consumed tokens: 12111052800 | elapsed time per iteration (s): 0.39 | learning rate: 1.840E-04 | global batch size: 256 | lm loss: 3.202217E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 660.805 | TFLOPs: 30.84 | 7: iteration 23200/ 115203 | consumed samples: 5939200 | consumed tokens: 12163481600 | elapsed time per iteration (s): 0.38 | learning rate: 1.839E-04 | global batch size: 256 | lm loss: 3.201076E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 670.597 | TFLOPs: 31.30 | 7: iteration 23300/ 115203 | consumed samples: 5964800 | consumed tokens: 12215910400 | elapsed time per iteration (s): 0.38 | learning rate: 1.838E-04 | global batch size: 256 | lm loss: 3.202009E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 666.067 | TFLOPs: 31.09 | 7: iteration 23400/ 115203 | consumed samples: 5990400 | consumed tokens: 12268339200 | elapsed time per iteration (s): 0.38 | learning rate: 1.836E-04 | global batch size: 256 | lm loss: 3.199352E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 670.547 | TFLOPs: 31.30 | 7: iteration 23500/ 115203 | consumed samples: 6016000 | consumed tokens: 12320768000 | elapsed time per iteration (s): 0.38 | learning rate: 1.835E-04 | global batch size: 256 | lm loss: 3.202464E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 669.049 | TFLOPs: 31.23 | 7: iteration 23600/ 115203 | consumed samples: 6041600 | consumed tokens: 12373196800 | elapsed time per iteration (s): 0.39 | learning rate: 1.833E-04 | global batch size: 256 | lm loss: 3.201653E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 664.381 | TFLOPs: 31.01 | 7: iteration 23700/ 115203 | consumed samples: 6067200 | consumed tokens: 12425625600 | elapsed time per iteration (s): 0.38 | learning rate: 1.832E-04 | global batch size: 256 | lm loss: 3.195366E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 669.939 | TFLOPs: 31.27 | 7: iteration 23800/ 115203 | consumed samples: 6092800 | consumed tokens: 12478054400 | elapsed time per iteration (s): 0.38 | learning rate: 1.830E-04 | global batch size: 256 | lm loss: 3.195291E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 673.894 | TFLOPs: 31.45 | 7: iteration 23900/ 115203 | consumed samples: 6118400 | consumed tokens: 12530483200 | elapsed time per iteration (s): 0.38 | learning rate: 1.829E-04 | global batch size: 256 | lm loss: 3.191921E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 674.009 | TFLOPs: 31.46 | 0: [2023-03-16 21:27:06,736] [INFO] [logging.py:68:log_dist] [Rank 0] step=24000, skipped=0, lr=[0.00018275670559336077, 0.00018275670559336077, 0.00018275670559336077], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 24000/ 115203 | consumed samples: 6144000 | consumed tokens: 12582912000 | elapsed time per iteration (s): 0.38 | learning rate: 1.828E-04 | global batch size: 256 | lm loss: 3.192054E+00 | grad norm: 0.272 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 675.555 | TFLOPs: 31.53 | 0: steps: 24000 loss: 3.1998 iter time (s): 0.383 samples/sec: 669.211 7: iteration 24100/ 115203 | consumed samples: 6169600 | consumed tokens: 12635340800 | elapsed time per iteration (s): 0.38 | learning rate: 1.826E-04 | global batch size: 256 | lm loss: 3.189301E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 672.820 | TFLOPs: 31.40 | 7: iteration 24200/ 115203 | consumed samples: 6195200 | consumed tokens: 12687769600 | elapsed time per iteration (s): 0.38 | learning rate: 1.825E-04 | global batch size: 256 | lm loss: 3.191049E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 675.980 | TFLOPs: 31.55 | 7: iteration 24300/ 115203 | consumed samples: 6220800 | consumed tokens: 12740198400 | elapsed time per iteration (s): 0.38 | learning rate: 1.823E-04 | global batch size: 256 | lm loss: 3.190548E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 673.237 | TFLOPs: 31.42 | 7: iteration 24400/ 115203 | consumed samples: 6246400 | consumed tokens: 12792627200 | elapsed time per iteration (s): 0.38 | learning rate: 1.822E-04 | global batch size: 256 | lm loss: 3.188528E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 675.210 | TFLOPs: 31.52 | 7: iteration 24500/ 115203 | consumed samples: 6272000 | consumed tokens: 12845056000 | elapsed time per iteration (s): 0.38 | learning rate: 1.820E-04 | global batch size: 256 | lm loss: 3.191011E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 672.758 | TFLOPs: 31.40 | 7: iteration 24600/ 115203 | consumed samples: 6297600 | consumed tokens: 12897484800 | elapsed time per iteration (s): 0.38 | learning rate: 1.819E-04 | global batch size: 256 | lm loss: 3.188860E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 676.985 | TFLOPs: 31.60 | 7: iteration 24700/ 115203 | consumed samples: 6323200 | consumed tokens: 12949913600 | elapsed time per iteration (s): 0.38 | learning rate: 1.817E-04 | global batch size: 256 | lm loss: 3.184141E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 677.733 | TFLOPs: 31.63 | 7: iteration 24800/ 115203 | consumed samples: 6348800 | consumed tokens: 13002342400 | elapsed time per iteration (s): 0.38 | learning rate: 1.816E-04 | global batch size: 256 | lm loss: 3.184653E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 674.709 | TFLOPs: 31.49 | 7: iteration 24900/ 115203 | consumed samples: 6374400 | consumed tokens: 13054771200 | elapsed time per iteration (s): 0.38 | learning rate: 1.814E-04 | global batch size: 256 | lm loss: 3.182930E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 676.536 | TFLOPs: 31.58 | 7: iteration 25000/ 115203 | consumed samples: 6400000 | consumed tokens: 13107200000 | elapsed time per iteration (s): 0.38 | learning rate: 1.813E-04 | global batch size: 256 | lm loss: 3.185225E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 677.055 | TFLOPs: 31.60 | 7: iteration 25100/ 115203 | consumed samples: 6425600 | consumed tokens: 13159628800 | elapsed time per iteration (s): 0.38 | learning rate: 1.811E-04 | global batch size: 256 | lm loss: 3.182067E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 673.747 | TFLOPs: 31.45 | 7: iteration 25200/ 115203 | consumed samples: 6451200 | consumed tokens: 13212057600 | elapsed time per iteration (s): 0.38 | learning rate: 1.810E-04 | global batch size: 256 | lm loss: 3.185917E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 678.694 | TFLOPs: 31.68 | 7: iteration 25300/ 115203 | consumed samples: 6476800 | consumed tokens: 13264486400 | elapsed time per iteration (s): 0.38 | learning rate: 1.808E-04 | global batch size: 256 | lm loss: 3.182505E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 679.579 | TFLOPs: 31.72 | 7: iteration 25400/ 115203 | consumed samples: 6502400 | consumed tokens: 13316915200 | elapsed time per iteration (s): 0.38 | learning rate: 1.807E-04 | global batch size: 256 | lm loss: 3.181649E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 678.537 | TFLOPs: 31.67 | 7: iteration 25500/ 115203 | consumed samples: 6528000 | consumed tokens: 13369344000 | elapsed time per iteration (s): 0.38 | learning rate: 1.805E-04 | global batch size: 256 | lm loss: 3.180153E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 680.581 | TFLOPs: 31.77 | 7: iteration 25600/ 115203 | consumed samples: 6553600 | consumed tokens: 13421772800 | elapsed time per iteration (s): 0.38 | learning rate: 1.804E-04 | global batch size: 256 | lm loss: 3.178896E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 680.632 | TFLOPs: 31.77 | 7: iteration 25700/ 115203 | consumed samples: 6579200 | consumed tokens: 13474201600 | elapsed time per iteration (s): 0.38 | learning rate: 1.802E-04 | global batch size: 256 | lm loss: 3.176281E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.181 | TFLOPs: 31.80 | 7: iteration 25800/ 115203 | consumed samples: 6604800 | consumed tokens: 13526630400 | elapsed time per iteration (s): 0.38 | learning rate: 1.800E-04 | global batch size: 256 | lm loss: 3.181899E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.491 | TFLOPs: 31.81 | 7: iteration 25900/ 115203 | consumed samples: 6630400 | consumed tokens: 13579059200 | elapsed time per iteration (s): 0.38 | learning rate: 1.799E-04 | global batch size: 256 | lm loss: 3.176365E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.921 | TFLOPs: 31.83 | 0: [2023-03-16 21:39:42,363] [INFO] [logging.py:68:log_dist] [Rank 0] step=26000, skipped=0, lr=[0.00017972931879823854, 0.00017972931879823854, 0.00017972931879823854], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 26000/ 115203 | consumed samples: 6656000 | consumed tokens: 13631488000 | elapsed time per iteration (s): 0.38 | learning rate: 1.797E-04 | global batch size: 256 | lm loss: 3.174571E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.553 | TFLOPs: 31.86 | 0: steps: 26000 loss: 3.1950 iter time (s): 0.376 samples/sec: 681.173 7: iteration 26100/ 115203 | consumed samples: 6681600 | consumed tokens: 13683916800 | elapsed time per iteration (s): 0.38 | learning rate: 1.796E-04 | global batch size: 256 | lm loss: 3.175020E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.615 | TFLOPs: 31.86 | 7: iteration 26200/ 115203 | consumed samples: 6707200 | consumed tokens: 13736345600 | elapsed time per iteration (s): 0.38 | learning rate: 1.794E-04 | global batch size: 256 | lm loss: 3.171677E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 678.078 | TFLOPs: 31.65 | 7: iteration 26300/ 115203 | consumed samples: 6732800 | consumed tokens: 13788774400 | elapsed time per iteration (s): 0.37 | learning rate: 1.793E-04 | global batch size: 256 | lm loss: 3.179593E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.881 | TFLOPs: 31.87 | 7: iteration 26400/ 115203 | consumed samples: 6758400 | consumed tokens: 13841203200 | elapsed time per iteration (s): 0.38 | learning rate: 1.791E-04 | global batch size: 256 | lm loss: 3.171591E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.576 | TFLOPs: 31.86 | 7: iteration 26500/ 115203 | consumed samples: 6784000 | consumed tokens: 13893632000 | elapsed time per iteration (s): 0.38 | learning rate: 1.789E-04 | global batch size: 256 | lm loss: 3.175758E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 679.442 | TFLOPs: 31.71 | 7: iteration 26600/ 115203 | consumed samples: 6809600 | consumed tokens: 13946060800 | elapsed time per iteration (s): 0.38 | learning rate: 1.788E-04 | global batch size: 256 | lm loss: 3.170164E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 680.664 | TFLOPs: 31.77 | 7: iteration 26700/ 115203 | consumed samples: 6835200 | consumed tokens: 13998489600 | elapsed time per iteration (s): 0.37 | learning rate: 1.786E-04 | global batch size: 256 | lm loss: 3.172083E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.913 | TFLOPs: 31.88 | 7: iteration 26800/ 115203 | consumed samples: 6860800 | consumed tokens: 14050918400 | elapsed time per iteration (s): 0.37 | learning rate: 1.785E-04 | global batch size: 256 | lm loss: 3.174631E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 683.289 | TFLOPs: 31.89 | 7: iteration 26900/ 115203 | consumed samples: 6886400 | consumed tokens: 14103347200 | elapsed time per iteration (s): 0.38 | learning rate: 1.783E-04 | global batch size: 256 | lm loss: 3.172884E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.197 | TFLOPs: 31.80 | 7: iteration 27000/ 115203 | consumed samples: 6912000 | consumed tokens: 14155776000 | elapsed time per iteration (s): 0.38 | learning rate: 1.781E-04 | global batch size: 256 | lm loss: 3.170733E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.165 | TFLOPs: 31.84 | 7: iteration 27100/ 115203 | consumed samples: 6937600 | consumed tokens: 14208204800 | elapsed time per iteration (s): 0.38 | learning rate: 1.780E-04 | global batch size: 256 | lm loss: 3.169637E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 680.718 | TFLOPs: 31.77 | 7: iteration 27200/ 115203 | consumed samples: 6963200 | consumed tokens: 14260633600 | elapsed time per iteration (s): 0.38 | learning rate: 1.778E-04 | global batch size: 256 | lm loss: 3.167655E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 679.955 | TFLOPs: 31.74 | 7: iteration 27300/ 115203 | consumed samples: 6988800 | consumed tokens: 14313062400 | elapsed time per iteration (s): 0.37 | learning rate: 1.776E-04 | global batch size: 256 | lm loss: 3.163120E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 683.000 | TFLOPs: 31.88 | 7: iteration 27400/ 115203 | consumed samples: 7014400 | consumed tokens: 14365491200 | elapsed time per iteration (s): 0.37 | learning rate: 1.775E-04 | global batch size: 256 | lm loss: 3.160518E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 683.276 | TFLOPs: 31.89 | 7: iteration 27500/ 115203 | consumed samples: 7040000 | consumed tokens: 14417920000 | elapsed time per iteration (s): 0.38 | learning rate: 1.773E-04 | global batch size: 256 | lm loss: 3.166499E+00 | grad norm: 0.269 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 680.015 | TFLOPs: 31.74 | 7: iteration 27600/ 115203 | consumed samples: 7065600 | consumed tokens: 14470348800 | elapsed time per iteration (s): 0.38 | learning rate: 1.772E-04 | global batch size: 256 | lm loss: 3.168377E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.167 | TFLOPs: 31.79 | 7: iteration 27700/ 115203 | consumed samples: 7091200 | consumed tokens: 14522777600 | elapsed time per iteration (s): 0.38 | learning rate: 1.770E-04 | global batch size: 256 | lm loss: 3.166710E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.096 | TFLOPs: 31.84 | 7: iteration 27800/ 115203 | consumed samples: 7116800 | consumed tokens: 14575206400 | elapsed time per iteration (s): 0.38 | learning rate: 1.768E-04 | global batch size: 256 | lm loss: 3.162841E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.514 | TFLOPs: 31.86 | 7: iteration 27900/ 115203 | consumed samples: 7142400 | consumed tokens: 14627635200 | elapsed time per iteration (s): 0.38 | learning rate: 1.767E-04 | global batch size: 256 | lm loss: 3.158737E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.497 | TFLOPs: 31.86 | 0: [2023-03-16 21:52:13,496] [INFO] [logging.py:68:log_dist] [Rank 0] step=28000, skipped=0, lr=[0.00017649035869598463, 0.00017649035869598463, 0.00017649035869598463], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 28000/ 115203 | consumed samples: 7168000 | consumed tokens: 14680064000 | elapsed time per iteration (s): 0.38 | learning rate: 1.765E-04 | global batch size: 256 | lm loss: 3.156910E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.553 | TFLOPs: 31.81 | 0: steps: 28000 loss: 3.1742 iter time (s): 0.374 samples/sec: 685.324 7: iteration 28100/ 115203 | consumed samples: 7193600 | consumed tokens: 14732492800 | elapsed time per iteration (s): 0.37 | learning rate: 1.763E-04 | global batch size: 256 | lm loss: 3.159809E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.685 | TFLOPs: 31.87 | 7: iteration 28200/ 115203 | consumed samples: 7219200 | consumed tokens: 14784921600 | elapsed time per iteration (s): 0.37 | learning rate: 1.762E-04 | global batch size: 256 | lm loss: 3.163224E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 683.750 | TFLOPs: 31.91 | 7: iteration 28300/ 115203 | consumed samples: 7244800 | consumed tokens: 14837350400 | elapsed time per iteration (s): 0.37 | learning rate: 1.760E-04 | global batch size: 256 | lm loss: 3.160435E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.954 | TFLOPs: 31.88 | 7: iteration 28400/ 115203 | consumed samples: 7270400 | consumed tokens: 14889779200 | elapsed time per iteration (s): 0.37 | learning rate: 1.758E-04 | global batch size: 256 | lm loss: 3.155622E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 683.321 | TFLOPs: 31.89 | 7: iteration 28500/ 115203 | consumed samples: 7296000 | consumed tokens: 14942208000 | elapsed time per iteration (s): 0.37 | learning rate: 1.756E-04 | global batch size: 256 | lm loss: 3.159859E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.668 | TFLOPs: 31.86 | 7: iteration 28600/ 115203 | consumed samples: 7321600 | consumed tokens: 14994636800 | elapsed time per iteration (s): 0.37 | learning rate: 1.755E-04 | global batch size: 256 | lm loss: 3.156768E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 683.571 | TFLOPs: 31.91 | 7: iteration 28700/ 115203 | consumed samples: 7347200 | consumed tokens: 15047065600 | elapsed time per iteration (s): 0.37 | learning rate: 1.753E-04 | global batch size: 256 | lm loss: 3.156943E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.739 | TFLOPs: 31.87 | 7: iteration 28800/ 115203 | consumed samples: 7372800 | consumed tokens: 15099494400 | elapsed time per iteration (s): 0.38 | learning rate: 1.751E-04 | global batch size: 256 | lm loss: 3.154863E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.371 | TFLOPs: 31.85 | 7: iteration 28900/ 115203 | consumed samples: 7398400 | consumed tokens: 15151923200 | elapsed time per iteration (s): 0.38 | learning rate: 1.750E-04 | global batch size: 256 | lm loss: 3.149339E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.053 | TFLOPs: 31.84 | 7: iteration 29000/ 115203 | consumed samples: 7424000 | consumed tokens: 15204352000 | elapsed time per iteration (s): 0.38 | learning rate: 1.748E-04 | global batch size: 256 | lm loss: 3.160522E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.986 | TFLOPs: 31.83 | 7: iteration 29100/ 115203 | consumed samples: 7449600 | consumed tokens: 15256780800 | elapsed time per iteration (s): 0.38 | learning rate: 1.746E-04 | global batch size: 256 | lm loss: 3.155705E+00 | grad norm: 0.270 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.141 | TFLOPs: 31.84 | 7: iteration 29200/ 115203 | consumed samples: 7475200 | consumed tokens: 15309209600 | elapsed time per iteration (s): 0.38 | learning rate: 1.744E-04 | global batch size: 256 | lm loss: 3.146628E+00 | grad norm: 0.270 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.188 | TFLOPs: 31.84 | 7: iteration 29300/ 115203 | consumed samples: 7500800 | consumed tokens: 15361638400 | elapsed time per iteration (s): 0.38 | learning rate: 1.743E-04 | global batch size: 256 | lm loss: 3.151369E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.186 | TFLOPs: 31.84 | 7: iteration 29400/ 115203 | consumed samples: 7526400 | consumed tokens: 15414067200 | elapsed time per iteration (s): 0.38 | learning rate: 1.741E-04 | global batch size: 256 | lm loss: 3.153183E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.080 | TFLOPs: 31.84 | 7: iteration 29500/ 115203 | consumed samples: 7552000 | consumed tokens: 15466496000 | elapsed time per iteration (s): 0.38 | learning rate: 1.739E-04 | global batch size: 256 | lm loss: 3.154760E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.069 | TFLOPs: 31.84 | 7: iteration 29600/ 115203 | consumed samples: 7577600 | consumed tokens: 15518924800 | elapsed time per iteration (s): 0.38 | learning rate: 1.738E-04 | global batch size: 256 | lm loss: 3.149746E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.070 | TFLOPs: 31.84 | 7: iteration 29700/ 115203 | consumed samples: 7603200 | consumed tokens: 15571353600 | elapsed time per iteration (s): 0.38 | learning rate: 1.736E-04 | global batch size: 256 | lm loss: 3.147081E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.386 | TFLOPs: 31.80 | 7: iteration 29800/ 115203 | consumed samples: 7628800 | consumed tokens: 15623782400 | elapsed time per iteration (s): 0.38 | learning rate: 1.734E-04 | global batch size: 256 | lm loss: 3.153542E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.958 | TFLOPs: 31.83 | 7: iteration 29900/ 115203 | consumed samples: 7654400 | consumed tokens: 15676211200 | elapsed time per iteration (s): 0.38 | learning rate: 1.732E-04 | global batch size: 256 | lm loss: 3.143490E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.319 | TFLOPs: 31.85 | 0: [2023-03-16 22:04:43,784] [INFO] [logging.py:68:log_dist] [Rank 0] step=30000, skipped=0, lr=[0.00017304965296758478, 0.00017304965296758478, 0.00017304965296758478], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 30000/ 115203 | consumed samples: 7680000 | consumed tokens: 15728640000 | elapsed time per iteration (s): 0.38 | learning rate: 1.730E-04 | global batch size: 256 | lm loss: 3.145585E+00 | grad norm: 0.270 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.812 | TFLOPs: 31.82 | 0: steps: 30000 loss: 3.1977 iter time (s): 0.373 samples/sec: 685.958 7: ------------------------------------------------------------------------------------------------ 7: validation loss at iteration 30000 | lm loss value: 3.771556E+00 | lm loss PPL: 4.344762E+01 | 7: ------------------------------------------------------------------------------------------------ 0: saving checkpoint at iteration 30000 to checkpoints_146m60b100m 0: [2023-03-16 22:04:43,912] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step30000 is begin to save! 0: [2023-03-16 22:04:43,916] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step30000/layer_01-model_00-model_states.pt... 0: [2023-03-16 22:04:44,010] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step30000/layer_01-model_00-model_states.pt. 0: [2023-03-16 22:04:44,010] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step30000/layer_03-model_00-model_states.pt... 0: [2023-03-16 22:04:44,026] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step30000/layer_03-model_00-model_states.pt. 0: [2023-03-16 22:04:44,026] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step30000/layer_04-model_00-model_states.pt... 0: [2023-03-16 22:04:44,041] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step30000/layer_04-model_00-model_states.pt. 0: [2023-03-16 22:04:44,041] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step30000/layer_05-model_00-model_states.pt... 0: [2023-03-16 22:04:44,056] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step30000/layer_05-model_00-model_states.pt. 0: [2023-03-16 22:04:44,056] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step30000/layer_06-model_00-model_states.pt... 0: [2023-03-16 22:04:44,071] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step30000/layer_06-model_00-model_states.pt. 0: [2023-03-16 22:04:44,071] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step30000/layer_07-model_00-model_states.pt... 0: [2023-03-16 22:04:44,086] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step30000/layer_07-model_00-model_states.pt. 0: [2023-03-16 22:04:44,086] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step30000/layer_08-model_00-model_states.pt... 0: [2023-03-16 22:04:44,100] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step30000/layer_08-model_00-model_states.pt. 0: [2023-03-16 22:04:44,101] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step30000/layer_09-model_00-model_states.pt... 0: [2023-03-16 22:04:44,115] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step30000/layer_09-model_00-model_states.pt. 0: [2023-03-16 22:04:44,116] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step30000/layer_10-model_00-model_states.pt... 0: [2023-03-16 22:04:44,130] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step30000/layer_10-model_00-model_states.pt. 0: [2023-03-16 22:04:44,130] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step30000/layer_11-model_00-model_states.pt... 0: [2023-03-16 22:04:44,145] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step30000/layer_11-model_00-model_states.pt. 0: [2023-03-16 22:04:44,145] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step30000/layer_12-model_00-model_states.pt... 0: [2023-03-16 22:04:44,160] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step30000/layer_12-model_00-model_states.pt. 0: [2023-03-16 22:04:44,160] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step30000/layer_13-model_00-model_states.pt... 0: [2023-03-16 22:04:44,175] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step30000/layer_13-model_00-model_states.pt. 0: [2023-03-16 22:04:44,175] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step30000/layer_14-model_00-model_states.pt... 0: [2023-03-16 22:04:44,190] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step30000/layer_14-model_00-model_states.pt. 0: [2023-03-16 22:04:44,190] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step30000/layer_15-model_00-model_states.pt... 0: [2023-03-16 22:04:44,205] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step30000/layer_15-model_00-model_states.pt. 0: [2023-03-16 22:04:44,205] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step30000/layer_16-model_00-model_states.pt... 0: [2023-03-16 22:04:44,221] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step30000/layer_16-model_00-model_states.pt. 0: [2023-03-16 22:04:44,221] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step30000/layer_17-model_00-model_states.pt... 0: [2023-03-16 22:04:44,236] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step30000/layer_17-model_00-model_states.pt. 0: [2023-03-16 22:04:44,236] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step30000/layer_19-model_00-model_states.pt... 0: [2023-03-16 22:04:44,237] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step30000/layer_19-model_00-model_states.pt. 0: [2023-03-16 22:04:44,238] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_146m60b100m/global_step30000/mp_rank_00_model_states.pt 0: [2023-03-16 22:04:44,238] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step30000/mp_rank_00_model_states.pt... 0: [2023-03-16 22:04:44,242] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step30000/mp_rank_00_model_states.pt. 0: [2023-03-16 22:04:44,260] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-16 22:04:44,260] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-16 22:04:44,260] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2023-03-16 22:04:44,260] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-16 22:04:44,260] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2023-03-16 22:04:44,260] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-16 22:04:44,260] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-16 22:04:44,260] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 2: [2023-03-16 22:04:44,260] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-16 22:04:44,260] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-16 22:04:44,260] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-16 22:04:44,260] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2023-03-16 22:04:44,260] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 6: [2023-03-16 22:04:44,260] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-16 22:04:44,260] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-16 22:04:44,260] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-16 22:04:44,260] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2023-03-16 22:04:44,260] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 3: [2023-03-16 22:04:44,260] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-16 22:04:44,260] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-16 22:04:44,260] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-16 22:04:44,260] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-16 22:04:44,260] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 4: [2023-03-16 22:04:44,260] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2023-03-16 22:04:44,260] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-16 22:04:44,260] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-16 22:04:44,260] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-16 22:04:44,260] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 5: [2023-03-16 22:04:44,260] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2023-03-16 22:04:44,260] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-16 22:04:44,260] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-03-16 22:04:44,260] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-16 22:04:44,260] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 7: [2023-03-16 22:04:44,260] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-16 22:04:44,260] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-16 22:04:44,260] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-16 22:04:44,260] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-16 22:04:44,260] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 1: [2023-03-16 22:04:44,260] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-03-16 22:04:44,259] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-16 22:04:44,259] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-16 22:04:44,260] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2023-03-16 22:04:44,259] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 2: [2023-03-16 22:04:44,260] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-16 22:04:44,260] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2023-03-16 22:04:44,260] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 6: [2023-03-16 22:04:44,260] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-16 22:04:44,260] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-16 22:04:44,260] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 3: [2023-03-16 22:04:44,260] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-16 22:04:44,260] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2023-03-16 22:04:44,260] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 4: [2023-03-16 22:04:44,260] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-16 22:04:44,260] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-16 22:04:44,260] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 5: [2023-03-16 22:04:44,260] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-16 22:04:44,260] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-16 22:04:44,260] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 7: [2023-03-16 22:04:44,260] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-16 22:04:44,260] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 1: [2023-03-16 22:04:44,259] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-16 22:04:44,260] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-16 22:04:44,260] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 7: [2023-03-16 22:04:44,260] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 0: [2023-03-16 22:04:44,295] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2023-03-16 22:04:44,295] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-03-16 22:04:44,295] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 0: [2023-03-16 22:04:44,296] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-03-16 22:04:44,297] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-03-16 22:04:44,297] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 0: [2023-03-16 22:04:44,297] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 0: [2023-03-16 22:04:44,298] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-03-16 22:04:44,298] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-16 22:04:44,298] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 0: [2023-03-16 22:04:44,298] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-03-16 22:04:44,298] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-16 22:04:44,298] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 0: [2023-03-16 22:04:44,301] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2023-03-16 22:04:44,301] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-16 22:04:44,301] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 0: [2023-03-16 22:04:44,304] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-03-16 22:04:44,304] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-03-16 22:04:44,304] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 0: [2023-03-16 22:04:44,305] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-16 22:04:44,305] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-03-16 22:04:44,305] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 5: [2023-03-16 22:04:44,306] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2023-03-16 22:04:44,306] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-16 22:04:44,306] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 5: [2023-03-16 22:04:44,306] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2023-03-16 22:04:44,307] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2023-03-16 22:04:44,307] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 5: [2023-03-16 22:04:44,307] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-03-16 22:04:44,307] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-03-16 22:04:44,307] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 5: [2023-03-16 22:04:44,307] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-03-16 22:04:44,307] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-03-16 22:04:44,307] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 2: [2023-03-16 22:04:44,307] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-16 22:04:44,307] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-16 22:04:44,307] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 2: [2023-03-16 22:04:44,307] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2023-03-16 22:04:44,307] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-03-16 22:04:44,307] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 2: [2023-03-16 22:04:44,308] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-03-16 22:04:44,308] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-03-16 22:04:44,308] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 2: [2023-03-16 22:04:44,309] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-16 22:04:44,309] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-16 22:04:44,309] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 1: [2023-03-16 22:04:44,309] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-03-16 22:04:44,309] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2023-03-16 22:04:44,309] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-03-16 22:04:44,309] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-03-16 22:04:44,309] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2023-03-16 22:04:44,309] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-16 22:04:44,309] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-03-16 22:04:44,309] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-16 22:04:44,309] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 1: [2023-03-16 22:04:44,309] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 1: [2023-03-16 22:04:44,309] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 1: [2023-03-16 22:04:44,309] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-16 22:04:44,309] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 1: [2023-03-16 22:04:44,309] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-03-16 22:04:44,309] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2023-03-16 22:04:44,310] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-16 22:04:44,310] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-03-16 22:04:44,310] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2023-03-16 22:04:44,310] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 1: [2023-03-16 22:04:44,310] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 1: [2023-03-16 22:04:44,310] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 1: [2023-03-16 22:04:44,310] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-03-16 22:04:44,310] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-03-16 22:04:44,310] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 5: [2023-03-16 22:04:44,310] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2023-03-16 22:04:44,310] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2023-03-16 22:04:44,310] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-03-16 22:04:44,310] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-03-16 22:04:44,310] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-03-16 22:04:44,310] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-03-16 22:04:44,310] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-03-16 22:04:44,310] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-03-16 22:04:44,310] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 5: [2023-03-16 22:04:44,310] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 5: [2023-03-16 22:04:44,310] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 5: [2023-03-16 22:04:44,310] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 2: [2023-03-16 22:04:44,316] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2023-03-16 22:04:44,316] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-03-16 22:04:44,316] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2023-03-16 22:04:44,316] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2023-03-16 22:04:44,316] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2023-03-16 22:04:44,316] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-03-16 22:04:44,316] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2023-03-16 22:04:44,316] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-03-16 22:04:44,316] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 2: [2023-03-16 22:04:44,316] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 2: [2023-03-16 22:04:44,316] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 2: [2023-03-16 22:04:44,316] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 7: [2023-03-16 22:04:44,319] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2023-03-16 22:04:44,319] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-03-16 22:04:44,319] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-16 22:04:44,319] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-03-16 22:04:44,319] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-03-16 22:04:44,319] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-16 22:04:44,319] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2023-03-16 22:04:44,319] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 7: [2023-03-16 22:04:44,319] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 7: [2023-03-16 22:04:44,319] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 7: [2023-03-16 22:04:44,319] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-03-16 22:04:44,319] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 7: [2023-03-16 22:04:44,319] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2023-03-16 22:04:44,320] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2023-03-16 22:04:44,320] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 7: [2023-03-16 22:04:44,320] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2023-03-16 22:04:44,320] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-03-16 22:04:44,320] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 7: [2023-03-16 22:04:44,320] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-03-16 22:04:44,320] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-03-16 22:04:44,320] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 7: [2023-03-16 22:04:44,320] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-03-16 22:04:44,320] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-16 22:04:44,320] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 6: [2023-03-16 22:04:44,324] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-03-16 22:04:44,324] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2023-03-16 22:04:44,324] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2023-03-16 22:04:44,324] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-03-16 22:04:44,324] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2023-03-16 22:04:44,324] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-03-16 22:04:44,324] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2023-03-16 22:04:44,324] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-03-16 22:04:44,324] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-03-16 22:04:44,324] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2023-03-16 22:04:44,324] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-16 22:04:44,324] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-16 22:04:44,324] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2023-03-16 22:04:44,324] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2023-03-16 22:04:44,324] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 6: [2023-03-16 22:04:44,324] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-16 22:04:44,324] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-16 22:04:44,324] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 6: [2023-03-16 22:04:44,324] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 6: [2023-03-16 22:04:44,324] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 6: [2023-03-16 22:04:44,324] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 6: [2023-03-16 22:04:44,324] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 6: [2023-03-16 22:04:44,324] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 6: [2023-03-16 22:04:44,324] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 0: [2023-03-16 22:04:44,328] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-16 22:04:44,328] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 3: [2023-03-16 22:04:44,333] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-03-16 22:04:44,333] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2023-03-16 22:04:44,333] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-16 22:04:44,333] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-03-16 22:04:44,333] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2023-03-16 22:04:44,333] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2023-03-16 22:04:44,334] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-16 22:04:44,333] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-03-16 22:04:44,334] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2023-03-16 22:04:44,333] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2023-03-16 22:04:44,334] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 3: [2023-03-16 22:04:44,334] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-16 22:04:44,334] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-03-16 22:04:44,334] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-16 22:04:44,334] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 3: [2023-03-16 22:04:44,334] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-16 22:04:44,334] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2023-03-16 22:04:44,334] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-03-16 22:04:44,334] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 3: [2023-03-16 22:04:44,334] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 3: [2023-03-16 22:04:44,334] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 3: [2023-03-16 22:04:44,334] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 3: [2023-03-16 22:04:44,334] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 3: [2023-03-16 22:04:44,334] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 4: [2023-03-16 22:04:44,335] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2023-03-16 22:04:44,335] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-03-16 22:04:44,335] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-03-16 22:04:44,335] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2023-03-16 22:04:44,335] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2023-03-16 22:04:44,335] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-03-16 22:04:44,335] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-03-16 22:04:44,335] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-03-16 22:04:44,335] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2023-03-16 22:04:44,335] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-16 22:04:44,335] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2023-03-16 22:04:44,335] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-03-16 22:04:44,335] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-16 22:04:44,335] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-16 22:04:44,335] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-03-16 22:04:44,335] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step30000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-16 22:04:44,335] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 4: [2023-03-16 22:04:44,335] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 4: [2023-03-16 22:04:44,335] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 4: [2023-03-16 22:04:44,335] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 4: [2023-03-16 22:04:44,335] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 4: [2023-03-16 22:04:44,335] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 4: [2023-03-16 22:04:44,335] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 4: [2023-03-16 22:04:44,335] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 0: successfully saved checkpoint at iteration 30000 to checkpoints_146m60b100m 7: time (ms) | save-checkpoint: 431.72 7: iteration 30100/ 115203 | consumed samples: 7705600 | consumed tokens: 15781068800 | elapsed time per iteration (s): 0.38 | learning rate: 1.729E-04 | global batch size: 256 | lm loss: 3.142265E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 672.438 | TFLOPs: 31.39 | 7: iteration 30200/ 115203 | consumed samples: 7731200 | consumed tokens: 15833497600 | elapsed time per iteration (s): 0.38 | learning rate: 1.727E-04 | global batch size: 256 | lm loss: 3.151393E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.003 | TFLOPs: 31.83 | 7: iteration 30300/ 115203 | consumed samples: 7756800 | consumed tokens: 15885926400 | elapsed time per iteration (s): 0.38 | learning rate: 1.725E-04 | global batch size: 256 | lm loss: 3.148736E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 675.628 | TFLOPs: 31.54 | 7: iteration 30400/ 115203 | consumed samples: 7782400 | consumed tokens: 15938355200 | elapsed time per iteration (s): 0.38 | learning rate: 1.723E-04 | global batch size: 256 | lm loss: 3.142936E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 679.386 | TFLOPs: 31.71 | 7: iteration 30500/ 115203 | consumed samples: 7808000 | consumed tokens: 15990784000 | elapsed time per iteration (s): 0.38 | learning rate: 1.722E-04 | global batch size: 256 | lm loss: 3.145670E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.743 | TFLOPs: 31.82 | 7: iteration 30600/ 115203 | consumed samples: 7833600 | consumed tokens: 16043212800 | elapsed time per iteration (s): 0.38 | learning rate: 1.720E-04 | global batch size: 256 | lm loss: 3.140629E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.924 | TFLOPs: 31.83 | 7: iteration 30700/ 115203 | consumed samples: 7859200 | consumed tokens: 16095641600 | elapsed time per iteration (s): 0.38 | learning rate: 1.718E-04 | global batch size: 256 | lm loss: 3.142969E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 675.957 | TFLOPs: 31.55 | 7: iteration 30800/ 115203 | consumed samples: 7884800 | consumed tokens: 16148070400 | elapsed time per iteration (s): 0.38 | learning rate: 1.716E-04 | global batch size: 256 | lm loss: 3.143951E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 680.774 | TFLOPs: 31.78 | 7: iteration 30900/ 115203 | consumed samples: 7910400 | consumed tokens: 16200499200 | elapsed time per iteration (s): 0.38 | learning rate: 1.714E-04 | global batch size: 256 | lm loss: 3.140463E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 672.085 | TFLOPs: 31.37 | 7: iteration 31000/ 115203 | consumed samples: 7936000 | consumed tokens: 16252928000 | elapsed time per iteration (s): 0.38 | learning rate: 1.713E-04 | global batch size: 256 | lm loss: 3.142645E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 679.036 | TFLOPs: 31.69 | 7: iteration 31100/ 115203 | consumed samples: 7961600 | consumed tokens: 16305356800 | elapsed time per iteration (s): 0.38 | learning rate: 1.711E-04 | global batch size: 256 | lm loss: 3.139933E+00 | grad norm: 0.264 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.412 | TFLOPs: 31.81 | 7: iteration 31200/ 115203 | consumed samples: 7987200 | consumed tokens: 16357785600 | elapsed time per iteration (s): 0.38 | learning rate: 1.709E-04 | global batch size: 256 | lm loss: 3.137532E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 678.392 | TFLOPs: 31.66 | 7: iteration 31300/ 115203 | consumed samples: 8012800 | consumed tokens: 16410214400 | elapsed time per iteration (s): 0.38 | learning rate: 1.707E-04 | global batch size: 256 | lm loss: 3.139804E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 676.363 | TFLOPs: 31.57 | 7: iteration 31400/ 115203 | consumed samples: 8038400 | consumed tokens: 16462643200 | elapsed time per iteration (s): 0.38 | learning rate: 1.705E-04 | global batch size: 256 | lm loss: 3.136203E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 676.406 | TFLOPs: 31.57 | 7: iteration 31500/ 115203 | consumed samples: 8064000 | consumed tokens: 16515072000 | elapsed time per iteration (s): 0.38 | learning rate: 1.703E-04 | global batch size: 256 | lm loss: 3.142650E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 674.063 | TFLOPs: 31.46 | 7: iteration 31600/ 115203 | consumed samples: 8089600 | consumed tokens: 16567500800 | elapsed time per iteration (s): 0.38 | learning rate: 1.702E-04 | global batch size: 256 | lm loss: 3.135335E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 680.227 | TFLOPs: 31.75 | 7: iteration 31700/ 115203 | consumed samples: 8115200 | consumed tokens: 16619929600 | elapsed time per iteration (s): 0.38 | learning rate: 1.700E-04 | global batch size: 256 | lm loss: 3.135554E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.113 | TFLOPs: 31.79 | 7: iteration 31800/ 115203 | consumed samples: 8140800 | consumed tokens: 16672358400 | elapsed time per iteration (s): 0.38 | learning rate: 1.698E-04 | global batch size: 256 | lm loss: 3.135486E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.606 | TFLOPs: 31.86 | 7: iteration 31900/ 115203 | consumed samples: 8166400 | consumed tokens: 16724787200 | elapsed time per iteration (s): 0.37 | learning rate: 1.696E-04 | global batch size: 256 | lm loss: 3.131100E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.802 | TFLOPs: 31.87 | 0: [2023-03-16 22:17:18,008] [INFO] [logging.py:68:log_dist] [Rank 0] step=32000, skipped=0, lr=[0.00016941764143236279, 0.00016941764143236279, 0.00016941764143236279], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 32000/ 115203 | consumed samples: 8192000 | consumed tokens: 16777216000 | elapsed time per iteration (s): 0.37 | learning rate: 1.694E-04 | global batch size: 256 | lm loss: 3.132502E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.853 | TFLOPs: 31.87 | 0: steps: 32000 loss: 3.0985 iter time (s): 0.375 samples/sec: 682.039 7: iteration 32100/ 115203 | consumed samples: 8217600 | consumed tokens: 16829644800 | elapsed time per iteration (s): 0.38 | learning rate: 1.692E-04 | global batch size: 256 | lm loss: 3.139013E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.368 | TFLOPs: 31.85 | 7: iteration 32200/ 115203 | consumed samples: 8243200 | consumed tokens: 16882073600 | elapsed time per iteration (s): 0.38 | learning rate: 1.690E-04 | global batch size: 256 | lm loss: 3.137553E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.500 | TFLOPs: 31.86 | 7: iteration 32300/ 115203 | consumed samples: 8268800 | consumed tokens: 16934502400 | elapsed time per iteration (s): 0.38 | learning rate: 1.689E-04 | global batch size: 256 | lm loss: 3.134233E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 677.734 | TFLOPs: 31.63 | 7: iteration 32400/ 115203 | consumed samples: 8294400 | consumed tokens: 16986931200 | elapsed time per iteration (s): 0.39 | learning rate: 1.687E-04 | global batch size: 256 | lm loss: 3.129110E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 663.644 | TFLOPs: 30.98 | 7: iteration 32500/ 115203 | consumed samples: 8320000 | consumed tokens: 17039360000 | elapsed time per iteration (s): 0.38 | learning rate: 1.685E-04 | global batch size: 256 | lm loss: 3.130789E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 665.573 | TFLOPs: 31.07 | 7: iteration 32600/ 115203 | consumed samples: 8345600 | consumed tokens: 17091788800 | elapsed time per iteration (s): 0.38 | learning rate: 1.683E-04 | global batch size: 256 | lm loss: 3.135242E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 669.637 | TFLOPs: 31.26 | 7: iteration 32700/ 115203 | consumed samples: 8371200 | consumed tokens: 17144217600 | elapsed time per iteration (s): 0.38 | learning rate: 1.681E-04 | global batch size: 256 | lm loss: 3.127797E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 668.562 | TFLOPs: 31.21 | 7: iteration 32800/ 115203 | consumed samples: 8396800 | consumed tokens: 17196646400 | elapsed time per iteration (s): 0.38 | learning rate: 1.679E-04 | global batch size: 256 | lm loss: 3.127784E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 671.813 | TFLOPs: 31.36 | 7: iteration 32900/ 115203 | consumed samples: 8422400 | consumed tokens: 17249075200 | elapsed time per iteration (s): 0.38 | learning rate: 1.677E-04 | global batch size: 256 | lm loss: 3.126344E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 668.531 | TFLOPs: 31.20 | 7: iteration 33000/ 115203 | consumed samples: 8448000 | consumed tokens: 17301504000 | elapsed time per iteration (s): 0.38 | learning rate: 1.675E-04 | global batch size: 256 | lm loss: 3.128718E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 673.268 | TFLOPs: 31.43 | 7: iteration 33100/ 115203 | consumed samples: 8473600 | consumed tokens: 17353932800 | elapsed time per iteration (s): 0.38 | learning rate: 1.673E-04 | global batch size: 256 | lm loss: 3.127798E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 670.869 | TFLOPs: 31.31 | 7: iteration 33200/ 115203 | consumed samples: 8499200 | consumed tokens: 17406361600 | elapsed time per iteration (s): 0.38 | learning rate: 1.672E-04 | global batch size: 256 | lm loss: 3.128416E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 668.862 | TFLOPs: 31.22 | 7: iteration 33300/ 115203 | consumed samples: 8524800 | consumed tokens: 17458790400 | elapsed time per iteration (s): 0.38 | learning rate: 1.670E-04 | global batch size: 256 | lm loss: 3.127723E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 669.899 | TFLOPs: 31.27 | 7: iteration 33400/ 115203 | consumed samples: 8550400 | consumed tokens: 17511219200 | elapsed time per iteration (s): 0.38 | learning rate: 1.668E-04 | global batch size: 256 | lm loss: 3.126086E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 673.922 | TFLOPs: 31.46 | 7: iteration 33500/ 115203 | consumed samples: 8576000 | consumed tokens: 17563648000 | elapsed time per iteration (s): 0.38 | learning rate: 1.666E-04 | global batch size: 256 | lm loss: 3.129164E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 669.797 | TFLOPs: 31.26 | 7: iteration 33600/ 115203 | consumed samples: 8601600 | consumed tokens: 17616076800 | elapsed time per iteration (s): 0.38 | learning rate: 1.664E-04 | global batch size: 256 | lm loss: 3.126123E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 666.182 | TFLOPs: 31.09 | 7: iteration 33700/ 115203 | consumed samples: 8627200 | consumed tokens: 17668505600 | elapsed time per iteration (s): 0.38 | learning rate: 1.662E-04 | global batch size: 256 | lm loss: 3.126758E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 671.829 | TFLOPs: 31.36 | 7: iteration 33800/ 115203 | consumed samples: 8652800 | consumed tokens: 17720934400 | elapsed time per iteration (s): 0.38 | learning rate: 1.660E-04 | global batch size: 256 | lm loss: 3.123887E+00 | grad norm: 0.271 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 670.832 | TFLOPs: 31.31 | 7: iteration 33900/ 115203 | consumed samples: 8678400 | consumed tokens: 17773363200 | elapsed time per iteration (s): 0.38 | learning rate: 1.658E-04 | global batch size: 256 | lm loss: 3.120798E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 672.002 | TFLOPs: 31.37 | 0: [2023-03-16 22:30:00,616] [INFO] [logging.py:68:log_dist] [Rank 0] step=34000, skipped=0, lr=[0.00016560534437138965, 0.00016560534437138965, 0.00016560534437138965], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 34000/ 115203 | consumed samples: 8704000 | consumed tokens: 17825792000 | elapsed time per iteration (s): 0.38 | learning rate: 1.656E-04 | global batch size: 256 | lm loss: 3.125171E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 670.482 | TFLOPs: 31.30 | 0: steps: 34000 loss: 3.1032 iter time (s): 0.380 samples/sec: 674.012 7: iteration 34100/ 115203 | consumed samples: 8729600 | consumed tokens: 17878220800 | elapsed time per iteration (s): 0.38 | learning rate: 1.654E-04 | global batch size: 256 | lm loss: 3.123733E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 667.777 | TFLOPs: 31.17 | 7: iteration 34200/ 115203 | consumed samples: 8755200 | consumed tokens: 17930649600 | elapsed time per iteration (s): 0.38 | learning rate: 1.652E-04 | global batch size: 256 | lm loss: 3.122541E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 672.875 | TFLOPs: 31.41 | 7: iteration 34300/ 115203 | consumed samples: 8780800 | consumed tokens: 17983078400 | elapsed time per iteration (s): 0.38 | learning rate: 1.650E-04 | global batch size: 256 | lm loss: 3.120808E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 670.779 | TFLOPs: 31.31 | 7: iteration 34400/ 115203 | consumed samples: 8806400 | consumed tokens: 18035507200 | elapsed time per iteration (s): 0.38 | learning rate: 1.648E-04 | global batch size: 256 | lm loss: 3.121784E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 673.752 | TFLOPs: 31.45 | 7: iteration 34500/ 115203 | consumed samples: 8832000 | consumed tokens: 18087936000 | elapsed time per iteration (s): 0.38 | learning rate: 1.646E-04 | global batch size: 256 | lm loss: 3.122778E+00 | grad norm: 0.270 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 673.477 | TFLOPs: 31.44 | 7: iteration 34600/ 115203 | consumed samples: 8857600 | consumed tokens: 18140364800 | elapsed time per iteration (s): 0.38 | learning rate: 1.644E-04 | global batch size: 256 | lm loss: 3.119092E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 676.046 | TFLOPs: 31.56 | 7: iteration 34700/ 115203 | consumed samples: 8883200 | consumed tokens: 18192793600 | elapsed time per iteration (s): 0.38 | learning rate: 1.642E-04 | global batch size: 256 | lm loss: 3.119095E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 676.454 | TFLOPs: 31.57 | 7: iteration 34800/ 115203 | consumed samples: 8908800 | consumed tokens: 18245222400 | elapsed time per iteration (s): 0.38 | learning rate: 1.640E-04 | global batch size: 256 | lm loss: 3.120819E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 672.487 | TFLOPs: 31.39 | 7: iteration 34900/ 115203 | consumed samples: 8934400 | consumed tokens: 18297651200 | elapsed time per iteration (s): 0.38 | learning rate: 1.638E-04 | global batch size: 256 | lm loss: 3.119382E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 676.296 | TFLOPs: 31.57 | 7: iteration 35000/ 115203 | consumed samples: 8960000 | consumed tokens: 18350080000 | elapsed time per iteration (s): 0.38 | learning rate: 1.636E-04 | global batch size: 256 | lm loss: 3.117686E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 678.600 | TFLOPs: 31.67 | 7: iteration 35100/ 115203 | consumed samples: 8985600 | consumed tokens: 18402508800 | elapsed time per iteration (s): 0.38 | learning rate: 1.634E-04 | global batch size: 256 | lm loss: 3.118351E+00 | grad norm: 0.272 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 674.323 | TFLOPs: 31.47 | 7: iteration 35200/ 115203 | consumed samples: 9011200 | consumed tokens: 18454937600 | elapsed time per iteration (s): 0.38 | learning rate: 1.632E-04 | global batch size: 256 | lm loss: 3.114703E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 675.047 | TFLOPs: 31.51 | 7: iteration 35300/ 115203 | consumed samples: 9036800 | consumed tokens: 18507366400 | elapsed time per iteration (s): 0.38 | learning rate: 1.630E-04 | global batch size: 256 | lm loss: 3.111748E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 676.371 | TFLOPs: 31.57 | 7: iteration 35400/ 115203 | consumed samples: 9062400 | consumed tokens: 18559795200 | elapsed time per iteration (s): 0.38 | learning rate: 1.628E-04 | global batch size: 256 | lm loss: 3.117537E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.423 | TFLOPs: 31.81 | 7: iteration 35500/ 115203 | consumed samples: 9088000 | consumed tokens: 18612224000 | elapsed time per iteration (s): 0.38 | learning rate: 1.626E-04 | global batch size: 256 | lm loss: 3.114328E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 680.599 | TFLOPs: 31.77 | 7: iteration 35600/ 115203 | consumed samples: 9113600 | consumed tokens: 18664652800 | elapsed time per iteration (s): 0.38 | learning rate: 1.624E-04 | global batch size: 256 | lm loss: 3.110891E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.088 | TFLOPs: 31.84 | 7: iteration 35700/ 115203 | consumed samples: 9139200 | consumed tokens: 18717081600 | elapsed time per iteration (s): 0.38 | learning rate: 1.622E-04 | global batch size: 256 | lm loss: 3.111093E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 680.879 | TFLOPs: 31.78 | 7: iteration 35800/ 115203 | consumed samples: 9164800 | consumed tokens: 18769510400 | elapsed time per iteration (s): 0.38 | learning rate: 1.620E-04 | global batch size: 256 | lm loss: 3.116365E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.610 | TFLOPs: 31.82 | 7: iteration 35900/ 115203 | consumed samples: 9190400 | consumed tokens: 18821939200 | elapsed time per iteration (s): 0.37 | learning rate: 1.618E-04 | global batch size: 256 | lm loss: 3.108833E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.968 | TFLOPs: 31.88 | 0: [2023-03-16 22:42:37,142] [INFO] [logging.py:68:log_dist] [Rank 0] step=36000, skipped=0, lr=[0.00016162432908965068, 0.00016162432908965068, 0.00016162432908965068], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 36000/ 115203 | consumed samples: 9216000 | consumed tokens: 18874368000 | elapsed time per iteration (s): 0.38 | learning rate: 1.616E-04 | global batch size: 256 | lm loss: 3.112130E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.220 | TFLOPs: 31.84 | 0: steps: 36000 loss: 3.1242 iter time (s): 0.377 samples/sec: 679.870 7: iteration 36100/ 115203 | consumed samples: 9241600 | consumed tokens: 18926796800 | elapsed time per iteration (s): 0.38 | learning rate: 1.614E-04 | global batch size: 256 | lm loss: 3.116288E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.149 | TFLOPs: 31.84 | 7: iteration 36200/ 115203 | consumed samples: 9267200 | consumed tokens: 18979225600 | elapsed time per iteration (s): 0.38 | learning rate: 1.612E-04 | global batch size: 256 | lm loss: 3.113602E+00 | grad norm: 0.265 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 680.740 | TFLOPs: 31.77 | 7: iteration 36300/ 115203 | consumed samples: 9292800 | consumed tokens: 19031654400 | elapsed time per iteration (s): 0.38 | learning rate: 1.610E-04 | global batch size: 256 | lm loss: 3.109067E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.311 | TFLOPs: 31.80 | 7: iteration 36400/ 115203 | consumed samples: 9318400 | consumed tokens: 19084083200 | elapsed time per iteration (s): 0.38 | learning rate: 1.608E-04 | global batch size: 256 | lm loss: 3.111284E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.982 | TFLOPs: 31.83 | 7: iteration 36500/ 115203 | consumed samples: 9344000 | consumed tokens: 19136512000 | elapsed time per iteration (s): 0.38 | learning rate: 1.606E-04 | global batch size: 256 | lm loss: 3.113777E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.253 | TFLOPs: 31.85 | 7: iteration 36600/ 115203 | consumed samples: 9369600 | consumed tokens: 19188940800 | elapsed time per iteration (s): 0.38 | learning rate: 1.604E-04 | global batch size: 256 | lm loss: 3.109631E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.305 | TFLOPs: 31.80 | 7: iteration 36700/ 115203 | consumed samples: 9395200 | consumed tokens: 19241369600 | elapsed time per iteration (s): 0.37 | learning rate: 1.602E-04 | global batch size: 256 | lm loss: 3.108755E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 683.409 | TFLOPs: 31.90 | 7: iteration 36800/ 115203 | consumed samples: 9420800 | consumed tokens: 19293798400 | elapsed time per iteration (s): 0.38 | learning rate: 1.600E-04 | global batch size: 256 | lm loss: 3.114555E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 680.341 | TFLOPs: 31.76 | 7: iteration 36900/ 115203 | consumed samples: 9446400 | consumed tokens: 19346227200 | elapsed time per iteration (s): 0.38 | learning rate: 1.598E-04 | global batch size: 256 | lm loss: 3.107099E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.592 | TFLOPs: 31.86 | 7: iteration 37000/ 115203 | consumed samples: 9472000 | consumed tokens: 19398656000 | elapsed time per iteration (s): 0.38 | learning rate: 1.596E-04 | global batch size: 256 | lm loss: 3.105387E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.269 | TFLOPs: 31.85 | 7: iteration 37100/ 115203 | consumed samples: 9497600 | consumed tokens: 19451084800 | elapsed time per iteration (s): 0.38 | learning rate: 1.594E-04 | global batch size: 256 | lm loss: 3.103785E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.463 | TFLOPs: 31.81 | 7: iteration 37200/ 115203 | consumed samples: 9523200 | consumed tokens: 19503513600 | elapsed time per iteration (s): 0.38 | learning rate: 1.592E-04 | global batch size: 256 | lm loss: 3.109965E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.939 | TFLOPs: 31.83 | 7: iteration 37300/ 115203 | consumed samples: 9548800 | consumed tokens: 19555942400 | elapsed time per iteration (s): 0.38 | learning rate: 1.590E-04 | global batch size: 256 | lm loss: 3.103632E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.705 | TFLOPs: 31.82 | 7: iteration 37400/ 115203 | consumed samples: 9574400 | consumed tokens: 19608371200 | elapsed time per iteration (s): 0.38 | learning rate: 1.587E-04 | global batch size: 256 | lm loss: 3.109356E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.813 | TFLOPs: 31.82 | 7: iteration 37500/ 115203 | consumed samples: 9600000 | consumed tokens: 19660800000 | elapsed time per iteration (s): 0.38 | learning rate: 1.585E-04 | global batch size: 256 | lm loss: 3.109124E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.902 | TFLOPs: 31.83 | 7: iteration 37600/ 115203 | consumed samples: 9625600 | consumed tokens: 19713228800 | elapsed time per iteration (s): 0.38 | learning rate: 1.583E-04 | global batch size: 256 | lm loss: 3.105298E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.341 | TFLOPs: 31.80 | 7: iteration 37700/ 115203 | consumed samples: 9651200 | consumed tokens: 19765657600 | elapsed time per iteration (s): 0.38 | learning rate: 1.581E-04 | global batch size: 256 | lm loss: 3.098805E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.342 | TFLOPs: 31.85 | 7: iteration 37800/ 115203 | consumed samples: 9676800 | consumed tokens: 19818086400 | elapsed time per iteration (s): 0.38 | learning rate: 1.579E-04 | global batch size: 256 | lm loss: 3.106694E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.434 | TFLOPs: 31.81 | 7: iteration 37900/ 115203 | consumed samples: 9702400 | consumed tokens: 19870515200 | elapsed time per iteration (s): 0.38 | learning rate: 1.577E-04 | global batch size: 256 | lm loss: 3.102672E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.602 | TFLOPs: 31.86 | 0: [2023-03-16 22:55:08,043] [INFO] [logging.py:68:log_dist] [Rank 0] step=38000, skipped=0, lr=[0.00015748667481842792, 0.00015748667481842792, 0.00015748667481842792], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 38000/ 115203 | consumed samples: 9728000 | consumed tokens: 19922944000 | elapsed time per iteration (s): 0.38 | learning rate: 1.575E-04 | global batch size: 256 | lm loss: 3.105317E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.096 | TFLOPs: 31.84 | 0: steps: 38000 loss: 3.1116 iter time (s): 0.373 samples/sec: 685.685 7: iteration 38100/ 115203 | consumed samples: 9753600 | consumed tokens: 19975372800 | elapsed time per iteration (s): 0.38 | learning rate: 1.573E-04 | global batch size: 256 | lm loss: 3.108598E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.224 | TFLOPs: 31.84 | 7: iteration 38200/ 115203 | consumed samples: 9779200 | consumed tokens: 20027801600 | elapsed time per iteration (s): 0.38 | learning rate: 1.571E-04 | global batch size: 256 | lm loss: 3.101750E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.998 | TFLOPs: 31.83 | 7: iteration 38300/ 115203 | consumed samples: 9804800 | consumed tokens: 20080230400 | elapsed time per iteration (s): 0.38 | learning rate: 1.569E-04 | global batch size: 256 | lm loss: 3.102094E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.137 | TFLOPs: 31.79 | 7: iteration 38400/ 115203 | consumed samples: 9830400 | consumed tokens: 20132659200 | elapsed time per iteration (s): 0.38 | learning rate: 1.566E-04 | global batch size: 256 | lm loss: 3.100015E+00 | grad norm: 0.269 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.370 | TFLOPs: 31.85 | 7: iteration 38500/ 115203 | consumed samples: 9856000 | consumed tokens: 20185088000 | elapsed time per iteration (s): 0.38 | learning rate: 1.564E-04 | global batch size: 256 | lm loss: 3.092952E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.175 | TFLOPs: 31.84 | 7: iteration 38600/ 115203 | consumed samples: 9881600 | consumed tokens: 20237516800 | elapsed time per iteration (s): 0.38 | learning rate: 1.562E-04 | global batch size: 256 | lm loss: 3.101250E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.207 | TFLOPs: 31.80 | 7: iteration 38700/ 115203 | consumed samples: 9907200 | consumed tokens: 20289945600 | elapsed time per iteration (s): 0.38 | learning rate: 1.560E-04 | global batch size: 256 | lm loss: 3.103126E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.339 | TFLOPs: 31.85 | 7: iteration 38800/ 115203 | consumed samples: 9932800 | consumed tokens: 20342374400 | elapsed time per iteration (s): 0.38 | learning rate: 1.558E-04 | global batch size: 256 | lm loss: 3.099786E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.658 | TFLOPs: 31.82 | 7: iteration 38900/ 115203 | consumed samples: 9958400 | consumed tokens: 20394803200 | elapsed time per iteration (s): 0.38 | learning rate: 1.556E-04 | global batch size: 256 | lm loss: 3.096346E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 680.782 | TFLOPs: 31.78 | 7: iteration 39000/ 115203 | consumed samples: 9984000 | consumed tokens: 20447232000 | elapsed time per iteration (s): 0.38 | learning rate: 1.554E-04 | global batch size: 256 | lm loss: 3.100576E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 680.470 | TFLOPs: 31.76 | 7: iteration 39100/ 115203 | consumed samples: 10009600 | consumed tokens: 20499660800 | elapsed time per iteration (s): 0.38 | learning rate: 1.551E-04 | global batch size: 256 | lm loss: 3.099673E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 680.678 | TFLOPs: 31.77 | 7: iteration 39200/ 115203 | consumed samples: 10035200 | consumed tokens: 20552089600 | elapsed time per iteration (s): 0.38 | learning rate: 1.549E-04 | global batch size: 256 | lm loss: 3.098156E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 680.063 | TFLOPs: 31.74 | 7: iteration 39300/ 115203 | consumed samples: 10060800 | consumed tokens: 20604518400 | elapsed time per iteration (s): 0.38 | learning rate: 1.547E-04 | global batch size: 256 | lm loss: 3.096743E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 678.372 | TFLOPs: 31.66 | 7: iteration 39400/ 115203 | consumed samples: 10086400 | consumed tokens: 20656947200 | elapsed time per iteration (s): 0.38 | learning rate: 1.545E-04 | global batch size: 256 | lm loss: 3.098180E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 675.367 | TFLOPs: 31.52 | 7: iteration 39500/ 115203 | consumed samples: 10112000 | consumed tokens: 20709376000 | elapsed time per iteration (s): 0.38 | learning rate: 1.543E-04 | global batch size: 256 | lm loss: 3.096275E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 679.773 | TFLOPs: 31.73 | 7: iteration 39600/ 115203 | consumed samples: 10137600 | consumed tokens: 20761804800 | elapsed time per iteration (s): 0.38 | learning rate: 1.541E-04 | global batch size: 256 | lm loss: 3.100927E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.702 | TFLOPs: 31.82 | 7: iteration 39700/ 115203 | consumed samples: 10163200 | consumed tokens: 20814233600 | elapsed time per iteration (s): 0.38 | learning rate: 1.539E-04 | global batch size: 256 | lm loss: 3.095899E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.747 | TFLOPs: 31.82 | 7: iteration 39800/ 115203 | consumed samples: 10188800 | consumed tokens: 20866662400 | elapsed time per iteration (s): 0.38 | learning rate: 1.536E-04 | global batch size: 256 | lm loss: 3.092916E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.161 | TFLOPs: 31.84 | 7: iteration 39900/ 115203 | consumed samples: 10214400 | consumed tokens: 20919091200 | elapsed time per iteration (s): 0.38 | learning rate: 1.534E-04 | global batch size: 256 | lm loss: 3.094998E+00 | grad norm: 0.264 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.290 | TFLOPs: 31.85 | 0: [2023-03-16 23:07:39,828] [INFO] [logging.py:68:log_dist] [Rank 0] step=40000, skipped=0, lr=[0.0001532049360643911, 0.0001532049360643911, 0.0001532049360643911], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 40000/ 115203 | consumed samples: 10240000 | consumed tokens: 20971520000 | elapsed time per iteration (s): 0.38 | learning rate: 1.532E-04 | global batch size: 256 | lm loss: 3.092971E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.459 | TFLOPs: 31.85 | 0: steps: 40000 loss: 3.0805 iter time (s): 0.374 samples/sec: 684.822 7: ------------------------------------------------------------------------------------------------ 7: validation loss at iteration 40000 | lm loss value: 3.810711E+00 | lm loss PPL: 4.518257E+01 | 7: ------------------------------------------------------------------------------------------------ 0: saving checkpoint at iteration 40000 to checkpoints_146m60b100m 0: [2023-03-16 23:07:39,957] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step40000 is begin to save! 0: [2023-03-16 23:07:39,961] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step40000/layer_01-model_00-model_states.pt... 0: [2023-03-16 23:07:40,054] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step40000/layer_01-model_00-model_states.pt. 0: [2023-03-16 23:07:40,055] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step40000/layer_03-model_00-model_states.pt... 0: [2023-03-16 23:07:40,070] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step40000/layer_03-model_00-model_states.pt. 0: [2023-03-16 23:07:40,070] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step40000/layer_04-model_00-model_states.pt... 0: [2023-03-16 23:07:40,085] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step40000/layer_04-model_00-model_states.pt. 0: [2023-03-16 23:07:40,085] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step40000/layer_05-model_00-model_states.pt... 0: [2023-03-16 23:07:40,100] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step40000/layer_05-model_00-model_states.pt. 0: [2023-03-16 23:07:40,100] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step40000/layer_06-model_00-model_states.pt... 0: [2023-03-16 23:07:40,115] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step40000/layer_06-model_00-model_states.pt. 0: [2023-03-16 23:07:40,115] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step40000/layer_07-model_00-model_states.pt... 0: [2023-03-16 23:07:40,130] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step40000/layer_07-model_00-model_states.pt. 0: [2023-03-16 23:07:40,130] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step40000/layer_08-model_00-model_states.pt... 0: [2023-03-16 23:07:40,144] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step40000/layer_08-model_00-model_states.pt. 0: [2023-03-16 23:07:40,145] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step40000/layer_09-model_00-model_states.pt... 0: [2023-03-16 23:07:40,159] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step40000/layer_09-model_00-model_states.pt. 0: [2023-03-16 23:07:40,159] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step40000/layer_10-model_00-model_states.pt... 0: [2023-03-16 23:07:40,174] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step40000/layer_10-model_00-model_states.pt. 0: [2023-03-16 23:07:40,174] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step40000/layer_11-model_00-model_states.pt... 0: [2023-03-16 23:07:40,189] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step40000/layer_11-model_00-model_states.pt. 0: [2023-03-16 23:07:40,189] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step40000/layer_12-model_00-model_states.pt... 0: [2023-03-16 23:07:40,204] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step40000/layer_12-model_00-model_states.pt. 0: [2023-03-16 23:07:40,204] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step40000/layer_13-model_00-model_states.pt... 0: [2023-03-16 23:07:40,218] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step40000/layer_13-model_00-model_states.pt. 0: [2023-03-16 23:07:40,219] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step40000/layer_14-model_00-model_states.pt... 0: [2023-03-16 23:07:40,233] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step40000/layer_14-model_00-model_states.pt. 0: [2023-03-16 23:07:40,233] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step40000/layer_15-model_00-model_states.pt... 0: [2023-03-16 23:07:40,248] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step40000/layer_15-model_00-model_states.pt. 0: [2023-03-16 23:07:40,248] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step40000/layer_16-model_00-model_states.pt... 0: [2023-03-16 23:07:40,263] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step40000/layer_16-model_00-model_states.pt. 0: [2023-03-16 23:07:40,263] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step40000/layer_17-model_00-model_states.pt... 0: [2023-03-16 23:07:40,278] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step40000/layer_17-model_00-model_states.pt. 0: [2023-03-16 23:07:40,278] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step40000/layer_19-model_00-model_states.pt... 0: [2023-03-16 23:07:40,279] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step40000/layer_19-model_00-model_states.pt. 0: [2023-03-16 23:07:40,280] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_146m60b100m/global_step40000/mp_rank_00_model_states.pt 0: [2023-03-16 23:07:40,280] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step40000/mp_rank_00_model_states.pt... 0: [2023-03-16 23:07:40,284] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step40000/mp_rank_00_model_states.pt. 0: [2023-03-16 23:07:40,301] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-16 23:07:40,301] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-16 23:07:40,301] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-16 23:07:40,301] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-16 23:07:40,301] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2023-03-16 23:07:40,301] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 5: [2023-03-16 23:07:40,301] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-03-16 23:07:40,301] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-16 23:07:40,301] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-16 23:07:40,301] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-16 23:07:40,301] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 7: [2023-03-16 23:07:40,301] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-16 23:07:40,301] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-16 23:07:40,301] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-16 23:07:40,301] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-16 23:07:40,301] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 0: [2023-03-16 23:07:40,301] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 2: [2023-03-16 23:07:40,301] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-16 23:07:40,301] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-16 23:07:40,301] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-16 23:07:40,301] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-16 23:07:40,301] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 6: [2023-03-16 23:07:40,301] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-16 23:07:40,301] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-16 23:07:40,301] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-16 23:07:40,301] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-16 23:07:40,301] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 3: [2023-03-16 23:07:40,301] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-16 23:07:40,301] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-16 23:07:40,301] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-16 23:07:40,301] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2023-03-16 23:07:40,301] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 4: [2023-03-16 23:07:40,301] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-16 23:07:40,301] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-16 23:07:40,301] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-16 23:07:40,301] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-16 23:07:40,301] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 7: [2023-03-16 23:07:40,301] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 1: [2023-03-16 23:07:40,301] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-16 23:07:40,301] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-16 23:07:40,301] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 0: [2023-03-16 23:07:40,301] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 2: [2023-03-16 23:07:40,301] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-16 23:07:40,301] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2023-03-16 23:07:40,301] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 6: [2023-03-16 23:07:40,301] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-16 23:07:40,301] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 3: [2023-03-16 23:07:40,301] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-16 23:07:40,301] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-16 23:07:40,301] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 4: [2023-03-16 23:07:40,301] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2023-03-16 23:07:40,301] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 5: [2023-03-16 23:07:40,301] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-16 23:07:40,301] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2023-03-16 23:07:40,301] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 7: [2023-03-16 23:07:40,301] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 7: [2023-03-16 23:07:40,301] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 1: [2023-03-16 23:07:40,301] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2023-03-16 23:07:40,301] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-16 23:07:40,301] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-16 23:07:40,301] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-16 23:07:40,301] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 6: [2023-03-16 23:07:40,301] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 4: [2023-03-16 23:07:40,301] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 0: [2023-03-16 23:07:40,338] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2023-03-16 23:07:40,338] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-03-16 23:07:40,338] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 0: [2023-03-16 23:07:40,338] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-03-16 23:07:40,339] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-03-16 23:07:40,339] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 0: [2023-03-16 23:07:40,340] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-03-16 23:07:40,340] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 6: [2023-03-16 23:07:40,338] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 0: [2023-03-16 23:07:40,340] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 1: [2023-03-16 23:07:40,340] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 6: [2023-03-16 23:07:40,339] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-16 23:07:40,339] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 1: [2023-03-16 23:07:40,340] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 6: [2023-03-16 23:07:40,340] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 1: [2023-03-16 23:07:40,340] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 6: [2023-03-16 23:07:40,340] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2023-03-16 23:07:40,340] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 6: [2023-03-16 23:07:40,340] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2023-03-16 23:07:40,340] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2023-03-16 23:07:40,340] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 6: [2023-03-16 23:07:40,340] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2023-03-16 23:07:40,340] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-16 23:07:40,340] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 6: [2023-03-16 23:07:40,341] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 1: [2023-03-16 23:07:40,341] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-03-16 23:07:40,341] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-03-16 23:07:40,341] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 1: [2023-03-16 23:07:40,341] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2023-03-16 23:07:40,341] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 6: [2023-03-16 23:07:40,341] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 1: [2023-03-16 23:07:40,341] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 6: [2023-03-16 23:07:40,341] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 1: [2023-03-16 23:07:40,341] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-03-16 23:07:40,341] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-16 23:07:40,341] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-16 23:07:40,342] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 6: [2023-03-16 23:07:40,341] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 1: [2023-03-16 23:07:40,342] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-16 23:07:40,342] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 6: [2023-03-16 23:07:40,342] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 0: [2023-03-16 23:07:40,342] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 0: [2023-03-16 23:07:40,342] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2023-03-16 23:07:40,342] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-03-16 23:07:40,342] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-16 23:07:40,342] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-03-16 23:07:40,342] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-03-16 23:07:40,342] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-16 23:07:40,342] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-03-16 23:07:40,342] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 6: [2023-03-16 23:07:40,342] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 0: [2023-03-16 23:07:40,342] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 0: [2023-03-16 23:07:40,342] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 0: [2023-03-16 23:07:40,342] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 0: [2023-03-16 23:07:40,342] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 6: [2023-03-16 23:07:40,342] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2023-03-16 23:07:40,342] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-03-16 23:07:40,343] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-16 23:07:40,343] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 1: [2023-03-16 23:07:40,343] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 6: [2023-03-16 23:07:40,343] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 6: [2023-03-16 23:07:40,343] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 1: [2023-03-16 23:07:40,343] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-16 23:07:40,343] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 1: [2023-03-16 23:07:40,343] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-03-16 23:07:40,343] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-03-16 23:07:40,343] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 1: [2023-03-16 23:07:40,343] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2023-03-16 23:07:40,343] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2023-03-16 23:07:40,343] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 0: [2023-03-16 23:07:40,365] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-16 23:07:40,365] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 3: [2023-03-16 23:07:40,375] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-16 23:07:40,375] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2023-03-16 23:07:40,375] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2023-03-16 23:07:40,375] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2023-03-16 23:07:40,375] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-03-16 23:07:40,375] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-03-16 23:07:40,375] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-16 23:07:40,375] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-03-16 23:07:40,375] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2023-03-16 23:07:40,375] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2023-03-16 23:07:40,375] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-03-16 23:07:40,375] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-03-16 23:07:40,375] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-16 23:07:40,375] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 3: [2023-03-16 23:07:40,375] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 3: [2023-03-16 23:07:40,375] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-16 23:07:40,375] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 3: [2023-03-16 23:07:40,375] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-16 23:07:40,375] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 3: [2023-03-16 23:07:40,375] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2023-03-16 23:07:40,375] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 3: [2023-03-16 23:07:40,375] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 3: [2023-03-16 23:07:40,375] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 3: [2023-03-16 23:07:40,375] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 2: [2023-03-16 23:07:40,377] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2023-03-16 23:07:40,377] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-03-16 23:07:40,377] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-16 23:07:40,377] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2023-03-16 23:07:40,377] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2023-03-16 23:07:40,377] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-03-16 23:07:40,377] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-16 23:07:40,377] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2023-03-16 23:07:40,377] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-16 23:07:40,377] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2023-03-16 23:07:40,377] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-03-16 23:07:40,377] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2023-03-16 23:07:40,377] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-03-16 23:07:40,377] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-03-16 23:07:40,377] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-03-16 23:07:40,377] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-16 23:07:40,377] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 2: [2023-03-16 23:07:40,377] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 2: [2023-03-16 23:07:40,377] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 2: [2023-03-16 23:07:40,377] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 2: [2023-03-16 23:07:40,377] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 2: [2023-03-16 23:07:40,377] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 2: [2023-03-16 23:07:40,377] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 2: [2023-03-16 23:07:40,377] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 7: [2023-03-16 23:07:40,377] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2023-03-16 23:07:40,377] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-03-16 23:07:40,377] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-16 23:07:40,377] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-03-16 23:07:40,377] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2023-03-16 23:07:40,377] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-03-16 23:07:40,377] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2023-03-16 23:07:40,377] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2023-03-16 23:07:40,378] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-03-16 23:07:40,378] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-03-16 23:07:40,378] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-03-16 23:07:40,378] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-03-16 23:07:40,378] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-16 23:07:40,378] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-03-16 23:07:40,378] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-16 23:07:40,378] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2023-03-16 23:07:40,378] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 7: [2023-03-16 23:07:40,378] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 7: [2023-03-16 23:07:40,378] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 7: [2023-03-16 23:07:40,378] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 7: [2023-03-16 23:07:40,378] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 7: [2023-03-16 23:07:40,378] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 7: [2023-03-16 23:07:40,378] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 7: [2023-03-16 23:07:40,378] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 4: [2023-03-16 23:07:40,379] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2023-03-16 23:07:40,379] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2023-03-16 23:07:40,379] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-03-16 23:07:40,379] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2023-03-16 23:07:40,379] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-03-16 23:07:40,379] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-03-16 23:07:40,379] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-03-16 23:07:40,379] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-03-16 23:07:40,379] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-03-16 23:07:40,379] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2023-03-16 23:07:40,379] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-16 23:07:40,379] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2023-03-16 23:07:40,379] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 4: [2023-03-16 23:07:40,380] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 4: [2023-03-16 23:07:40,380] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 4: [2023-03-16 23:07:40,379] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-16 23:07:40,379] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-16 23:07:40,380] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-16 23:07:40,380] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-03-16 23:07:40,380] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 4: [2023-03-16 23:07:40,380] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 4: [2023-03-16 23:07:40,380] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 4: [2023-03-16 23:07:40,380] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 4: [2023-03-16 23:07:40,380] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 5: [2023-03-16 23:07:40,380] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-03-16 23:07:40,380] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2023-03-16 23:07:40,380] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2023-03-16 23:07:40,380] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-03-16 23:07:40,380] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2023-03-16 23:07:40,380] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-03-16 23:07:40,380] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2023-03-16 23:07:40,380] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-03-16 23:07:40,380] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-03-16 23:07:40,380] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-16 23:07:40,380] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2023-03-16 23:07:40,380] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-03-16 23:07:40,380] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-03-16 23:07:40,380] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-03-16 23:07:40,380] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-03-16 23:07:40,380] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step40000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-03-16 23:07:40,380] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 5: [2023-03-16 23:07:40,380] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 5: [2023-03-16 23:07:40,380] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 5: [2023-03-16 23:07:40,380] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 5: [2023-03-16 23:07:40,380] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 5: [2023-03-16 23:07:40,380] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 5: [2023-03-16 23:07:40,380] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 5: [2023-03-16 23:07:40,380] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 0: successfully saved checkpoint at iteration 40000 to checkpoints_146m60b100m 7: time (ms) | save-checkpoint: 432.77 7: iteration 40100/ 115203 | consumed samples: 10265600 | consumed tokens: 21023948800 | elapsed time per iteration (s): 0.38 | learning rate: 1.530E-04 | global batch size: 256 | lm loss: 3.094862E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 672.698 | TFLOPs: 31.40 | 7: iteration 40200/ 115203 | consumed samples: 10291200 | consumed tokens: 21076377600 | elapsed time per iteration (s): 0.38 | learning rate: 1.528E-04 | global batch size: 256 | lm loss: 3.092217E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.006 | TFLOPs: 31.79 | 7: iteration 40300/ 115203 | consumed samples: 10316800 | consumed tokens: 21128806400 | elapsed time per iteration (s): 0.38 | learning rate: 1.526E-04 | global batch size: 256 | lm loss: 3.093407E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 680.566 | TFLOPs: 31.77 | 7: iteration 40400/ 115203 | consumed samples: 10342400 | consumed tokens: 21181235200 | elapsed time per iteration (s): 0.38 | learning rate: 1.523E-04 | global batch size: 256 | lm loss: 3.090409E+00 | grad norm: 0.266 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 679.605 | TFLOPs: 31.72 | 7: iteration 40500/ 115203 | consumed samples: 10368000 | consumed tokens: 21233664000 | elapsed time per iteration (s): 0.38 | learning rate: 1.521E-04 | global batch size: 256 | lm loss: 3.095532E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 677.678 | TFLOPs: 31.63 | 7: iteration 40600/ 115203 | consumed samples: 10393600 | consumed tokens: 21286092800 | elapsed time per iteration (s): 0.38 | learning rate: 1.519E-04 | global batch size: 256 | lm loss: 3.090684E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 680.481 | TFLOPs: 31.76 | 7: iteration 40700/ 115203 | consumed samples: 10419200 | consumed tokens: 21338521600 | elapsed time per iteration (s): 0.38 | learning rate: 1.517E-04 | global batch size: 256 | lm loss: 3.089868E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 677.086 | TFLOPs: 31.60 | 7: iteration 40800/ 115203 | consumed samples: 10444800 | consumed tokens: 21390950400 | elapsed time per iteration (s): 0.38 | learning rate: 1.515E-04 | global batch size: 256 | lm loss: 3.089403E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 679.663 | TFLOPs: 31.72 | 7: iteration 40900/ 115203 | consumed samples: 10470400 | consumed tokens: 21443379200 | elapsed time per iteration (s): 0.38 | learning rate: 1.512E-04 | global batch size: 256 | lm loss: 3.092189E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.356 | TFLOPs: 31.80 | 7: iteration 41000/ 115203 | consumed samples: 10496000 | consumed tokens: 21495808000 | elapsed time per iteration (s): 0.38 | learning rate: 1.510E-04 | global batch size: 256 | lm loss: 3.090455E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 676.809 | TFLOPs: 31.59 | 7: iteration 41100/ 115203 | consumed samples: 10521600 | consumed tokens: 21548236800 | elapsed time per iteration (s): 0.38 | learning rate: 1.508E-04 | global batch size: 256 | lm loss: 3.092723E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 674.825 | TFLOPs: 31.50 | 7: iteration 41200/ 115203 | consumed samples: 10547200 | consumed tokens: 21600665600 | elapsed time per iteration (s): 0.38 | learning rate: 1.506E-04 | global batch size: 256 | lm loss: 3.088025E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 665.562 | TFLOPs: 31.07 | 7: iteration 41300/ 115203 | consumed samples: 10572800 | consumed tokens: 21653094400 | elapsed time per iteration (s): 0.38 | learning rate: 1.504E-04 | global batch size: 256 | lm loss: 3.085857E+00 | grad norm: 0.267 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 677.263 | TFLOPs: 31.61 | 7: iteration 41400/ 115203 | consumed samples: 10598400 | consumed tokens: 21705523200 | elapsed time per iteration (s): 0.38 | learning rate: 1.501E-04 | global batch size: 256 | lm loss: 3.086017E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 680.476 | TFLOPs: 31.76 | 7: iteration 41500/ 115203 | consumed samples: 10624000 | consumed tokens: 21757952000 | elapsed time per iteration (s): 0.38 | learning rate: 1.499E-04 | global batch size: 256 | lm loss: 3.088545E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 667.570 | TFLOPs: 31.16 | 7: iteration 41600/ 115203 | consumed samples: 10649600 | consumed tokens: 21810380800 | elapsed time per iteration (s): 0.39 | learning rate: 1.497E-04 | global batch size: 256 | lm loss: 3.084211E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 652.396 | TFLOPs: 30.45 | 7: iteration 41700/ 115203 | consumed samples: 10675200 | consumed tokens: 21862809600 | elapsed time per iteration (s): 0.43 | learning rate: 1.495E-04 | global batch size: 256 | lm loss: 3.084654E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.329 | TFLOPs: 27.97 | 7: iteration 41800/ 115203 | consumed samples: 10700800 | consumed tokens: 21915238400 | elapsed time per iteration (s): 0.38 | learning rate: 1.492E-04 | global batch size: 256 | lm loss: 3.086947E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 679.202 | TFLOPs: 31.70 | 7: iteration 41900/ 115203 | consumed samples: 10726400 | consumed tokens: 21967667200 | elapsed time per iteration (s): 0.38 | learning rate: 1.490E-04 | global batch size: 256 | lm loss: 3.083287E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 677.940 | TFLOPs: 31.64 | 0: [2023-03-16 23:20:22,207] [INFO] [logging.py:68:log_dist] [Rank 0] step=42000, skipped=0, lr=[0.0001487921045166041, 0.0001487921045166041, 0.0001487921045166041], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 42000/ 115203 | consumed samples: 10752000 | consumed tokens: 22020096000 | elapsed time per iteration (s): 0.38 | learning rate: 1.488E-04 | global batch size: 256 | lm loss: 3.082399E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 680.789 | TFLOPs: 31.78 | 0: steps: 42000 loss: 3.0843 iter time (s): 0.379 samples/sec: 675.702 7: iteration 42100/ 115203 | consumed samples: 10777600 | consumed tokens: 22072524800 | elapsed time per iteration (s): 0.38 | learning rate: 1.486E-04 | global batch size: 256 | lm loss: 3.085389E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 680.703 | TFLOPs: 31.77 | 7: iteration 42200/ 115203 | consumed samples: 10803200 | consumed tokens: 22124953600 | elapsed time per iteration (s): 0.38 | learning rate: 1.483E-04 | global batch size: 256 | lm loss: 3.085146E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 680.827 | TFLOPs: 31.78 | 7: iteration 42300/ 115203 | consumed samples: 10828800 | consumed tokens: 22177382400 | elapsed time per iteration (s): 0.38 | learning rate: 1.481E-04 | global batch size: 256 | lm loss: 3.085752E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.572 | TFLOPs: 31.81 | 7: iteration 42400/ 115203 | consumed samples: 10854400 | consumed tokens: 22229811200 | elapsed time per iteration (s): 0.38 | learning rate: 1.479E-04 | global batch size: 256 | lm loss: 3.085936E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.014 | TFLOPs: 31.83 | 7: iteration 42500/ 115203 | consumed samples: 10880000 | consumed tokens: 22282240000 | elapsed time per iteration (s): 0.38 | learning rate: 1.477E-04 | global batch size: 256 | lm loss: 3.084120E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.521 | TFLOPs: 31.81 | 7: iteration 42600/ 115203 | consumed samples: 10905600 | consumed tokens: 22334668800 | elapsed time per iteration (s): 0.38 | learning rate: 1.474E-04 | global batch size: 256 | lm loss: 3.085203E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.133 | TFLOPs: 31.84 | 7: iteration 42700/ 115203 | consumed samples: 10931200 | consumed tokens: 22387097600 | elapsed time per iteration (s): 0.38 | learning rate: 1.472E-04 | global batch size: 256 | lm loss: 3.081561E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 672.529 | TFLOPs: 31.39 | 7: iteration 42800/ 115203 | consumed samples: 10956800 | consumed tokens: 22439526400 | elapsed time per iteration (s): 0.38 | learning rate: 1.470E-04 | global batch size: 256 | lm loss: 3.083801E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 671.305 | TFLOPs: 31.33 | 7: iteration 42900/ 115203 | consumed samples: 10982400 | consumed tokens: 22491955200 | elapsed time per iteration (s): 0.38 | learning rate: 1.468E-04 | global batch size: 256 | lm loss: 3.081451E+00 | grad norm: 0.272 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 670.789 | TFLOPs: 31.31 | 7: iteration 43000/ 115203 | consumed samples: 11008000 | consumed tokens: 22544384000 | elapsed time per iteration (s): 0.38 | learning rate: 1.465E-04 | global batch size: 256 | lm loss: 3.078773E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 671.455 | TFLOPs: 31.34 | 7: iteration 43100/ 115203 | consumed samples: 11033600 | consumed tokens: 22596812800 | elapsed time per iteration (s): 0.38 | learning rate: 1.463E-04 | global batch size: 256 | lm loss: 3.080587E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 669.778 | TFLOPs: 31.26 | 7: iteration 43200/ 115203 | consumed samples: 11059200 | consumed tokens: 22649241600 | elapsed time per iteration (s): 0.38 | learning rate: 1.461E-04 | global batch size: 256 | lm loss: 3.081078E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 673.930 | TFLOPs: 31.46 | 7: iteration 43300/ 115203 | consumed samples: 11084800 | consumed tokens: 22701670400 | elapsed time per iteration (s): 0.38 | learning rate: 1.459E-04 | global batch size: 256 | lm loss: 3.080695E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 675.614 | TFLOPs: 31.54 | 7: iteration 43400/ 115203 | consumed samples: 11110400 | consumed tokens: 22754099200 | elapsed time per iteration (s): 0.38 | learning rate: 1.456E-04 | global batch size: 256 | lm loss: 3.079598E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 675.911 | TFLOPs: 31.55 | 7: iteration 43500/ 115203 | consumed samples: 11136000 | consumed tokens: 22806528000 | elapsed time per iteration (s): 0.38 | learning rate: 1.454E-04 | global batch size: 256 | lm loss: 3.082151E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 676.451 | TFLOPs: 31.57 | 7: iteration 43600/ 115203 | consumed samples: 11161600 | consumed tokens: 22858956800 | elapsed time per iteration (s): 0.38 | learning rate: 1.452E-04 | global batch size: 256 | lm loss: 3.078404E+00 | grad norm: 0.272 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 673.908 | TFLOPs: 31.46 | 7: iteration 43700/ 115203 | consumed samples: 11187200 | consumed tokens: 22911385600 | elapsed time per iteration (s): 0.38 | learning rate: 1.449E-04 | global batch size: 256 | lm loss: 3.080914E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 674.314 | TFLOPs: 31.47 | 7: iteration 43800/ 115203 | consumed samples: 11212800 | consumed tokens: 22963814400 | elapsed time per iteration (s): 0.38 | learning rate: 1.447E-04 | global batch size: 256 | lm loss: 3.081024E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 674.675 | TFLOPs: 31.49 | 7: iteration 43900/ 115203 | consumed samples: 11238400 | consumed tokens: 23016243200 | elapsed time per iteration (s): 0.38 | learning rate: 1.445E-04 | global batch size: 256 | lm loss: 3.082698E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 673.831 | TFLOPs: 31.45 | 0: [2023-03-16 23:32:59,834] [INFO] [logging.py:68:log_dist] [Rank 0] step=44000, skipped=0, lr=[0.00014426156962702883, 0.00014426156962702883, 0.00014426156962702883], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 44000/ 115203 | consumed samples: 11264000 | consumed tokens: 23068672000 | elapsed time per iteration (s): 0.38 | learning rate: 1.443E-04 | global batch size: 256 | lm loss: 3.079236E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 673.133 | TFLOPs: 31.42 | 0: steps: 44000 loss: 3.0723 iter time (s): 0.377 samples/sec: 679.581 7: iteration 44100/ 115203 | consumed samples: 11289600 | consumed tokens: 23121100800 | elapsed time per iteration (s): 0.38 | learning rate: 1.440E-04 | global batch size: 256 | lm loss: 3.079429E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 674.789 | TFLOPs: 31.50 | 7: iteration 44200/ 115203 | consumed samples: 11315200 | consumed tokens: 23173529600 | elapsed time per iteration (s): 0.38 | learning rate: 1.438E-04 | global batch size: 256 | lm loss: 3.075340E+00 | grad norm: 0.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 675.333 | TFLOPs: 31.52 | 7: iteration 44300/ 115203 | consumed samples: 11340800 | consumed tokens: 23225958400 | elapsed time per iteration (s): 0.38 | learning rate: 1.436E-04 | global batch size: 256 | lm loss: 3.079666E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 675.969 | TFLOPs: 31.55 | 7: iteration 44400/ 115203 | consumed samples: 11366400 | consumed tokens: 23278387200 | elapsed time per iteration (s): 0.38 | learning rate: 1.433E-04 | global batch size: 256 | lm loss: 3.075562E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 676.349 | TFLOPs: 31.57 | 7: iteration 44500/ 115203 | consumed samples: 11392000 | consumed tokens: 23330816000 | elapsed time per iteration (s): 0.38 | learning rate: 1.431E-04 | global batch size: 256 | lm loss: 3.077431E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 677.390 | TFLOPs: 31.62 | 7: iteration 44600/ 115203 | consumed samples: 11417600 | consumed tokens: 23383244800 | elapsed time per iteration (s): 0.38 | learning rate: 1.429E-04 | global batch size: 256 | lm loss: 3.078321E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 669.952 | TFLOPs: 31.27 | 7: iteration 44700/ 115203 | consumed samples: 11443200 | consumed tokens: 23435673600 | elapsed time per iteration (s): 0.38 | learning rate: 1.427E-04 | global batch size: 256 | lm loss: 3.073980E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 674.987 | TFLOPs: 31.51 | 7: iteration 44800/ 115203 | consumed samples: 11468800 | consumed tokens: 23488102400 | elapsed time per iteration (s): 0.38 | learning rate: 1.424E-04 | global batch size: 256 | lm loss: 3.074407E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 678.540 | TFLOPs: 31.67 | 7: iteration 44900/ 115203 | consumed samples: 11494400 | consumed tokens: 23540531200 | elapsed time per iteration (s): 0.38 | learning rate: 1.422E-04 | global batch size: 256 | lm loss: 3.073847E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 676.334 | TFLOPs: 31.57 | 7: iteration 45000/ 115203 | consumed samples: 11520000 | consumed tokens: 23592960000 | elapsed time per iteration (s): 0.38 | learning rate: 1.420E-04 | global batch size: 256 | lm loss: 3.071671E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 679.623 | TFLOPs: 31.72 | 7: iteration 45100/ 115203 | consumed samples: 11545600 | consumed tokens: 23645388800 | elapsed time per iteration (s): 0.38 | learning rate: 1.417E-04 | global batch size: 256 | lm loss: 3.073258E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 675.891 | TFLOPs: 31.55 | 7: iteration 45200/ 115203 | consumed samples: 11571200 | consumed tokens: 23697817600 | elapsed time per iteration (s): 0.38 | learning rate: 1.415E-04 | global batch size: 256 | lm loss: 3.076125E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 678.095 | TFLOPs: 31.65 | 7: iteration 45300/ 115203 | consumed samples: 11596800 | consumed tokens: 23750246400 | elapsed time per iteration (s): 0.38 | learning rate: 1.413E-04 | global batch size: 256 | lm loss: 3.070214E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 679.807 | TFLOPs: 31.73 | 7: iteration 45400/ 115203 | consumed samples: 11622400 | consumed tokens: 23802675200 | elapsed time per iteration (s): 0.38 | learning rate: 1.410E-04 | global batch size: 256 | lm loss: 3.068241E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 680.142 | TFLOPs: 31.75 | 7: iteration 45500/ 115203 | consumed samples: 11648000 | consumed tokens: 23855104000 | elapsed time per iteration (s): 0.38 | learning rate: 1.408E-04 | global batch size: 256 | lm loss: 3.069749E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.161 | TFLOPs: 31.79 | 7: iteration 45600/ 115203 | consumed samples: 11673600 | consumed tokens: 23907532800 | elapsed time per iteration (s): 0.38 | learning rate: 1.406E-04 | global batch size: 256 | lm loss: 3.071538E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 679.104 | TFLOPs: 31.70 | 7: iteration 45700/ 115203 | consumed samples: 11699200 | consumed tokens: 23959961600 | elapsed time per iteration (s): 0.38 | learning rate: 1.403E-04 | global batch size: 256 | lm loss: 3.071055E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 679.437 | TFLOPs: 31.71 | 7: iteration 45800/ 115203 | consumed samples: 11724800 | consumed tokens: 24012390400 | elapsed time per iteration (s): 0.38 | learning rate: 1.401E-04 | global batch size: 256 | lm loss: 3.072962E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.337 | TFLOPs: 31.80 | 7: iteration 45900/ 115203 | consumed samples: 11750400 | consumed tokens: 24064819200 | elapsed time per iteration (s): 0.38 | learning rate: 1.399E-04 | global batch size: 256 | lm loss: 3.069200E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.041 | TFLOPs: 31.84 | 0: [2023-03-16 23:45:35,343] [INFO] [logging.py:68:log_dist] [Rank 0] step=46000, skipped=0, lr=[0.0001396270779841331, 0.0001396270779841331, 0.0001396270779841331], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 46000/ 115203 | consumed samples: 11776000 | consumed tokens: 24117248000 | elapsed time per iteration (s): 0.38 | learning rate: 1.396E-04 | global batch size: 256 | lm loss: 3.067166E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 677.725 | TFLOPs: 31.63 | 0: steps: 46000 loss: 3.0182 iter time (s): 0.376 samples/sec: 681.554 7: iteration 46100/ 115203 | consumed samples: 11801600 | consumed tokens: 24169676800 | elapsed time per iteration (s): 0.38 | learning rate: 1.394E-04 | global batch size: 256 | lm loss: 3.071765E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.054 | TFLOPs: 31.79 | 7: iteration 46200/ 115203 | consumed samples: 11827200 | consumed tokens: 24222105600 | elapsed time per iteration (s): 0.38 | learning rate: 1.392E-04 | global batch size: 256 | lm loss: 3.068814E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.974 | TFLOPs: 31.83 | 7: iteration 46300/ 115203 | consumed samples: 11852800 | consumed tokens: 24274534400 | elapsed time per iteration (s): 0.38 | learning rate: 1.389E-04 | global batch size: 256 | lm loss: 3.070567E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 680.532 | TFLOPs: 31.76 | 7: iteration 46400/ 115203 | consumed samples: 11878400 | consumed tokens: 24326963200 | elapsed time per iteration (s): 0.38 | learning rate: 1.387E-04 | global batch size: 256 | lm loss: 3.068769E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.922 | TFLOPs: 31.83 | 7: iteration 46500/ 115203 | consumed samples: 11904000 | consumed tokens: 24379392000 | elapsed time per iteration (s): 0.38 | learning rate: 1.385E-04 | global batch size: 256 | lm loss: 3.067914E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.292 | TFLOPs: 31.85 | 7: iteration 46600/ 115203 | consumed samples: 11929600 | consumed tokens: 24431820800 | elapsed time per iteration (s): 0.38 | learning rate: 1.382E-04 | global batch size: 256 | lm loss: 3.068047E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.461 | TFLOPs: 31.81 | 7: iteration 46700/ 115203 | consumed samples: 11955200 | consumed tokens: 24484249600 | elapsed time per iteration (s): 0.38 | learning rate: 1.380E-04 | global batch size: 256 | lm loss: 3.068116E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.823 | TFLOPs: 31.82 | 7: iteration 46800/ 115203 | consumed samples: 11980800 | consumed tokens: 24536678400 | elapsed time per iteration (s): 0.38 | learning rate: 1.377E-04 | global batch size: 256 | lm loss: 3.065934E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.393 | TFLOPs: 31.85 | 7: iteration 46900/ 115203 | consumed samples: 12006400 | consumed tokens: 24589107200 | elapsed time per iteration (s): 0.38 | learning rate: 1.375E-04 | global batch size: 256 | lm loss: 3.069843E+00 | grad norm: 0.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.643 | TFLOPs: 31.82 | 7: iteration 47000/ 115203 | consumed samples: 12032000 | consumed tokens: 24641536000 | elapsed time per iteration (s): 0.38 | learning rate: 1.373E-04 | global batch size: 256 | lm loss: 3.069273E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.535 | TFLOPs: 31.86 | 7: iteration 47100/ 115203 | consumed samples: 12057600 | consumed tokens: 24693964800 | elapsed time per iteration (s): 0.38 | learning rate: 1.370E-04 | global batch size: 256 | lm loss: 3.067135E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.852 | TFLOPs: 31.83 | 7: iteration 47200/ 115203 | consumed samples: 12083200 | consumed tokens: 24746393600 | elapsed time per iteration (s): 0.38 | learning rate: 1.368E-04 | global batch size: 256 | lm loss: 3.066056E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.474 | TFLOPs: 31.86 | 7: iteration 47300/ 115203 | consumed samples: 12108800 | consumed tokens: 24798822400 | elapsed time per iteration (s): 0.38 | learning rate: 1.366E-04 | global batch size: 256 | lm loss: 3.066519E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.509 | TFLOPs: 31.86 | 7: iteration 47400/ 115203 | consumed samples: 12134400 | consumed tokens: 24851251200 | elapsed time per iteration (s): 0.38 | learning rate: 1.363E-04 | global batch size: 256 | lm loss: 3.064034E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.086 | TFLOPs: 31.84 | 7: iteration 47500/ 115203 | consumed samples: 12160000 | consumed tokens: 24903680000 | elapsed time per iteration (s): 0.38 | learning rate: 1.361E-04 | global batch size: 256 | lm loss: 3.066262E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.147 | TFLOPs: 31.84 | 7: iteration 47600/ 115203 | consumed samples: 12185600 | consumed tokens: 24956108800 | elapsed time per iteration (s): 0.38 | learning rate: 1.359E-04 | global batch size: 256 | lm loss: 3.063256E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.261 | TFLOPs: 31.85 | 7: iteration 47700/ 115203 | consumed samples: 12211200 | consumed tokens: 25008537600 | elapsed time per iteration (s): 0.38 | learning rate: 1.356E-04 | global batch size: 256 | lm loss: 3.068125E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.557 | TFLOPs: 31.86 | 7: iteration 47800/ 115203 | consumed samples: 12236800 | consumed tokens: 25060966400 | elapsed time per iteration (s): 0.38 | learning rate: 1.354E-04 | global batch size: 256 | lm loss: 3.065632E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.448 | TFLOPs: 31.85 | 7: iteration 47900/ 115203 | consumed samples: 12262400 | consumed tokens: 25113395200 | elapsed time per iteration (s): 0.38 | learning rate: 1.351E-04 | global batch size: 256 | lm loss: 3.067194E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.499 | TFLOPs: 31.86 | 0: [2023-03-16 23:58:06,068] [INFO] [logging.py:68:log_dist] [Rank 0] step=48000, skipped=0, lr=[0.00013490269160287214, 0.00013490269160287214, 0.00013490269160287214], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 48000/ 115203 | consumed samples: 12288000 | consumed tokens: 25165824000 | elapsed time per iteration (s): 0.38 | learning rate: 1.349E-04 | global batch size: 256 | lm loss: 3.058994E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.679 | TFLOPs: 31.82 | 0: steps: 48000 loss: 3.0264 iter time (s): 0.373 samples/sec: 685.594 7: iteration 48100/ 115203 | consumed samples: 12313600 | consumed tokens: 25218252800 | elapsed time per iteration (s): 0.38 | learning rate: 1.347E-04 | global batch size: 256 | lm loss: 3.060908E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 679.439 | TFLOPs: 31.71 | 7: iteration 48200/ 115203 | consumed samples: 12339200 | consumed tokens: 25270681600 | elapsed time per iteration (s): 0.38 | learning rate: 1.344E-04 | global batch size: 256 | lm loss: 3.059460E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 676.623 | TFLOPs: 31.58 | 7: iteration 48300/ 115203 | consumed samples: 12364800 | consumed tokens: 25323110400 | elapsed time per iteration (s): 0.38 | learning rate: 1.342E-04 | global batch size: 256 | lm loss: 3.065843E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 678.174 | TFLOPs: 31.65 | 7: iteration 48400/ 115203 | consumed samples: 12390400 | consumed tokens: 25375539200 | elapsed time per iteration (s): 0.38 | learning rate: 1.339E-04 | global batch size: 256 | lm loss: 3.059307E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 678.236 | TFLOPs: 31.66 | 7: iteration 48500/ 115203 | consumed samples: 12416000 | consumed tokens: 25427968000 | elapsed time per iteration (s): 0.38 | learning rate: 1.337E-04 | global batch size: 256 | lm loss: 3.063543E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.529 | TFLOPs: 31.81 | 7: iteration 48600/ 115203 | consumed samples: 12441600 | consumed tokens: 25480396800 | elapsed time per iteration (s): 0.38 | learning rate: 1.335E-04 | global batch size: 256 | lm loss: 3.060309E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 676.909 | TFLOPs: 31.60 | 7: iteration 48700/ 115203 | consumed samples: 12467200 | consumed tokens: 25532825600 | elapsed time per iteration (s): 0.38 | learning rate: 1.332E-04 | global batch size: 256 | lm loss: 3.060706E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 674.072 | TFLOPs: 31.46 | 7: iteration 48800/ 115203 | consumed samples: 12492800 | consumed tokens: 25585254400 | elapsed time per iteration (s): 0.38 | learning rate: 1.330E-04 | global batch size: 256 | lm loss: 3.058851E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 677.351 | TFLOPs: 31.62 | 7: iteration 48900/ 115203 | consumed samples: 12518400 | consumed tokens: 25637683200 | elapsed time per iteration (s): 0.38 | learning rate: 1.328E-04 | global batch size: 256 | lm loss: 3.067480E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 678.220 | TFLOPs: 31.66 | 7: iteration 49000/ 115203 | consumed samples: 12544000 | consumed tokens: 25690112000 | elapsed time per iteration (s): 0.38 | learning rate: 1.325E-04 | global batch size: 256 | lm loss: 3.059245E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 677.933 | TFLOPs: 31.64 | 7: iteration 49100/ 115203 | consumed samples: 12569600 | consumed tokens: 25742540800 | elapsed time per iteration (s): 0.38 | learning rate: 1.323E-04 | global batch size: 256 | lm loss: 3.057929E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 675.358 | TFLOPs: 31.52 | 7: iteration 49200/ 115203 | consumed samples: 12595200 | consumed tokens: 25794969600 | elapsed time per iteration (s): 0.38 | learning rate: 1.320E-04 | global batch size: 256 | lm loss: 3.061223E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 678.482 | TFLOPs: 31.67 | 7: iteration 49300/ 115203 | consumed samples: 12620800 | consumed tokens: 25847398400 | elapsed time per iteration (s): 0.38 | learning rate: 1.318E-04 | global batch size: 256 | lm loss: 3.057676E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 677.959 | TFLOPs: 31.64 | 7: iteration 49400/ 115203 | consumed samples: 12646400 | consumed tokens: 25899827200 | elapsed time per iteration (s): 0.38 | learning rate: 1.315E-04 | global batch size: 256 | lm loss: 3.056433E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 678.221 | TFLOPs: 31.66 | 7: iteration 49500/ 115203 | consumed samples: 12672000 | consumed tokens: 25952256000 | elapsed time per iteration (s): 0.38 | learning rate: 1.313E-04 | global batch size: 256 | lm loss: 3.056531E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 679.334 | TFLOPs: 31.71 | 7: iteration 49600/ 115203 | consumed samples: 12697600 | consumed tokens: 26004684800 | elapsed time per iteration (s): 0.38 | learning rate: 1.311E-04 | global batch size: 256 | lm loss: 3.055078E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 677.516 | TFLOPs: 31.62 | 7: iteration 49700/ 115203 | consumed samples: 12723200 | consumed tokens: 26057113600 | elapsed time per iteration (s): 0.38 | learning rate: 1.308E-04 | global batch size: 256 | lm loss: 3.059481E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 677.574 | TFLOPs: 31.63 | 7: iteration 49800/ 115203 | consumed samples: 12748800 | consumed tokens: 26109542400 | elapsed time per iteration (s): 0.38 | learning rate: 1.306E-04 | global batch size: 256 | lm loss: 3.054511E+00 | grad norm: 0.270 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 679.281 | TFLOPs: 31.71 | 7: iteration 49900/ 115203 | consumed samples: 12774400 | consumed tokens: 26161971200 | elapsed time per iteration (s): 0.38 | learning rate: 1.303E-04 | global batch size: 256 | lm loss: 3.053400E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 679.311 | TFLOPs: 31.71 | 0: [2023-03-17 00:10:41,236] [INFO] [logging.py:68:log_dist] [Rank 0] step=50000, skipped=0, lr=[0.00013010274525760026, 0.00013010274525760026, 0.00013010274525760026], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 50000/ 115203 | consumed samples: 12800000 | consumed tokens: 26214400000 | elapsed time per iteration (s): 0.38 | learning rate: 1.301E-04 | global batch size: 256 | lm loss: 3.054333E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 678.281 | TFLOPs: 31.66 | 0: steps: 50000 loss: 2.9987 iter time (s): 0.376 samples/sec: 681.669 7: ------------------------------------------------------------------------------------------------ 7: validation loss at iteration 50000 | lm loss value: 3.812437E+00 | lm loss PPL: 4.526059E+01 | 7: ------------------------------------------------------------------------------------------------ 0: saving checkpoint at iteration 50000 to checkpoints_146m60b100m 0: [2023-03-17 00:10:41,372] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step50000 is begin to save! 0: [2023-03-17 00:10:41,377] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step50000/layer_01-model_00-model_states.pt... 0: [2023-03-17 00:10:41,490] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step50000/layer_01-model_00-model_states.pt. 0: [2023-03-17 00:10:41,490] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step50000/layer_03-model_00-model_states.pt... 0: [2023-03-17 00:10:41,506] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step50000/layer_03-model_00-model_states.pt. 0: [2023-03-17 00:10:41,507] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step50000/layer_04-model_00-model_states.pt... 0: [2023-03-17 00:10:41,521] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step50000/layer_04-model_00-model_states.pt. 0: [2023-03-17 00:10:41,522] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step50000/layer_05-model_00-model_states.pt... 0: [2023-03-17 00:10:41,536] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step50000/layer_05-model_00-model_states.pt. 0: [2023-03-17 00:10:41,536] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step50000/layer_06-model_00-model_states.pt... 0: [2023-03-17 00:10:41,551] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step50000/layer_06-model_00-model_states.pt. 0: [2023-03-17 00:10:41,551] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step50000/layer_07-model_00-model_states.pt... 0: [2023-03-17 00:10:41,566] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step50000/layer_07-model_00-model_states.pt. 0: [2023-03-17 00:10:41,566] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step50000/layer_08-model_00-model_states.pt... 0: [2023-03-17 00:10:41,581] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step50000/layer_08-model_00-model_states.pt. 0: [2023-03-17 00:10:41,581] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step50000/layer_09-model_00-model_states.pt... 0: [2023-03-17 00:10:41,596] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step50000/layer_09-model_00-model_states.pt. 0: [2023-03-17 00:10:41,596] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step50000/layer_10-model_00-model_states.pt... 0: [2023-03-17 00:10:41,611] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step50000/layer_10-model_00-model_states.pt. 0: [2023-03-17 00:10:41,611] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step50000/layer_11-model_00-model_states.pt... 0: [2023-03-17 00:10:41,625] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step50000/layer_11-model_00-model_states.pt. 0: [2023-03-17 00:10:41,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step50000/layer_12-model_00-model_states.pt... 0: [2023-03-17 00:10:41,641] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step50000/layer_12-model_00-model_states.pt. 0: [2023-03-17 00:10:41,641] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step50000/layer_13-model_00-model_states.pt... 0: [2023-03-17 00:10:41,655] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step50000/layer_13-model_00-model_states.pt. 0: [2023-03-17 00:10:41,656] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step50000/layer_14-model_00-model_states.pt... 0: [2023-03-17 00:10:41,670] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step50000/layer_14-model_00-model_states.pt. 0: [2023-03-17 00:10:41,671] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step50000/layer_15-model_00-model_states.pt... 0: [2023-03-17 00:10:41,685] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step50000/layer_15-model_00-model_states.pt. 0: [2023-03-17 00:10:41,686] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step50000/layer_16-model_00-model_states.pt... 0: [2023-03-17 00:10:41,700] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step50000/layer_16-model_00-model_states.pt. 0: [2023-03-17 00:10:41,701] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step50000/layer_17-model_00-model_states.pt... 0: [2023-03-17 00:10:41,715] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step50000/layer_17-model_00-model_states.pt. 0: [2023-03-17 00:10:41,715] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step50000/layer_19-model_00-model_states.pt... 0: [2023-03-17 00:10:41,716] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step50000/layer_19-model_00-model_states.pt. 0: [2023-03-17 00:10:41,717] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_146m60b100m/global_step50000/mp_rank_00_model_states.pt 0: [2023-03-17 00:10:41,717] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step50000/mp_rank_00_model_states.pt... 0: [2023-03-17 00:10:41,721] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step50000/mp_rank_00_model_states.pt. 0: [2023-03-17 00:10:41,739] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:10:41,739] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:10:41,739] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:10:41,739] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:10:41,739] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:10:41,739] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:10:41,739] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:10:41,739] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:10:41,739] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:10:41,739] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:10:41,739] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:10:41,739] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:10:41,739] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:10:41,739] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:10:41,739] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:10:41,739] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:10:41,739] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:10:41,739] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:10:41,739] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:10:41,739] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:10:41,739] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:10:41,739] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:10:41,739] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:10:41,739] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:10:41,739] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:10:41,739] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:10:41,739] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:10:41,739] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:10:41,739] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:10:41,739] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:10:41,739] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:10:41,739] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:10:41,739] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:10:41,739] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:10:41,739] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:10:41,739] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:10:41,739] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:10:41,739] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:10:41,739] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:10:41,739] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:10:41,739] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:10:41,739] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:10:41,739] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:10:41,739] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:10:41,739] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:10:41,739] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:10:41,739] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:10:41,739] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:10:41,739] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:10:41,739] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:10:41,739] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:10:41,739] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:10:41,739] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:10:41,739] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:10:41,739] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:10:41,739] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:10:41,739] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:10:41,739] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:10:41,739] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:10:41,739] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:10:41,739] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:10:41,739] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:10:41,739] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:10:41,739] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:10:41,774] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:10:41,775] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:10:41,775] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-03-17 00:10:41,775] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 0: [2023-03-17 00:10:41,782] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:10:41,782] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:10:41,782] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:10:41,782] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-17 00:10:41,782] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-03-17 00:10:41,782] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 0: [2023-03-17 00:10:41,782] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-03-17 00:10:41,783] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 0: [2023-03-17 00:10:41,783] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 0: [2023-03-17 00:10:41,783] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:10:41,783] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:10:41,783] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:10:41,783] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-17 00:10:41,783] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-17 00:10:41,783] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-03-17 00:10:41,783] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 0: [2023-03-17 00:10:41,783] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 0: [2023-03-17 00:10:41,783] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 5: [2023-03-17 00:10:41,786] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:10:41,786] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2023-03-17 00:10:41,786] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 5: [2023-03-17 00:10:41,786] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:10:41,786] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 1: [2023-03-17 00:10:41,787] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:10:41,786] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 5: [2023-03-17 00:10:41,786] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:10:41,787] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:10:41,787] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 5: [2023-03-17 00:10:41,787] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-03-17 00:10:41,787] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-03-17 00:10:41,787] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 5: [2023-03-17 00:10:41,787] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 1: [2023-03-17 00:10:41,787] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 1: [2023-03-17 00:10:41,787] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:10:41,787] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-03-17 00:10:41,787] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 1: [2023-03-17 00:10:41,787] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:10:41,787] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 00:10:41,788] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 1: [2023-03-17 00:10:41,788] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:10:41,788] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-03-17 00:10:41,788] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 5: [2023-03-17 00:10:41,788] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:10:41,788] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-03-17 00:10:41,788] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:10:41,788] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:10:41,788] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:10:41,788] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 5: [2023-03-17 00:10:41,788] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-17 00:10:41,788] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-03-17 00:10:41,788] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-03-17 00:10:41,788] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 5: [2023-03-17 00:10:41,788] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 5: [2023-03-17 00:10:41,788] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 7: [2023-03-17 00:10:41,792] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:10:41,792] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:10:41,792] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:10:41,792] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:10:41,792] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-03-17 00:10:41,792] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 00:10:41,792] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-03-17 00:10:41,792] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 7: [2023-03-17 00:10:41,792] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-03-17 00:10:41,792] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 7: [2023-03-17 00:10:41,792] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 7: [2023-03-17 00:10:41,792] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 7: [2023-03-17 00:10:41,792] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:10:41,792] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:10:41,792] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:10:41,792] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:10:41,792] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2023-03-17 00:10:41,792] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-03-17 00:10:41,792] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 00:10:41,792] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-03-17 00:10:41,792] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 7: [2023-03-17 00:10:41,792] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 7: [2023-03-17 00:10:41,792] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 7: [2023-03-17 00:10:41,792] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 3: [2023-03-17 00:10:41,796] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:10:41,796] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:10:41,796] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:10:41,796] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:10:41,796] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:10:41,796] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:10:41,796] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:10:41,796] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:10:41,796] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-17 00:10:41,796] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-03-17 00:10:41,796] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2023-03-17 00:10:41,796] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-17 00:10:41,796] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-03-17 00:10:41,796] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 00:10:41,796] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-17 00:10:41,796] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2023-03-17 00:10:41,796] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 3: [2023-03-17 00:10:41,796] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 3: [2023-03-17 00:10:41,796] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 3: [2023-03-17 00:10:41,796] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 3: [2023-03-17 00:10:41,796] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 3: [2023-03-17 00:10:41,796] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 3: [2023-03-17 00:10:41,796] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 3: [2023-03-17 00:10:41,796] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 1: [2023-03-17 00:10:41,797] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:10:41,797] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-03-17 00:10:41,797] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 1: [2023-03-17 00:10:41,797] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:10:41,797] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2023-03-17 00:10:41,797] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 1: [2023-03-17 00:10:41,797] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:10:41,797] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-17 00:10:41,797] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 1: [2023-03-17 00:10:41,797] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:10:41,797] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 00:10:41,797] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 4: [2023-03-17 00:10:41,801] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:10:41,801] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:10:41,801] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:10:41,801] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:10:41,801] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:10:41,801] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-03-17 00:10:41,801] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:10:41,801] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:10:41,801] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2023-03-17 00:10:41,801] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-03-17 00:10:41,801] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 00:10:41,801] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 4: [2023-03-17 00:10:41,801] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 4: [2023-03-17 00:10:41,801] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 4: [2023-03-17 00:10:41,801] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 00:10:41,801] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 00:10:41,801] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2023-03-17 00:10:41,801] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 4: [2023-03-17 00:10:41,801] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 4: [2023-03-17 00:10:41,801] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 4: [2023-03-17 00:10:41,801] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 0: [2023-03-17 00:10:41,802] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 00:10:41,802] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 4: [2023-03-17 00:10:41,802] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:10:41,802] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 00:10:41,802] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 6: [2023-03-17 00:10:41,800] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:10:41,800] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:10:41,800] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:10:41,800] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:10:41,800] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:10:41,800] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:10:41,800] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:10:41,800] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:10:41,800] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2023-03-17 00:10:41,800] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 00:10:41,800] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-03-17 00:10:41,800] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 00:10:41,800] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-17 00:10:41,800] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2023-03-17 00:10:41,800] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-17 00:10:41,800] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2023-03-17 00:10:41,800] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 6: [2023-03-17 00:10:41,800] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 6: [2023-03-17 00:10:41,800] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 6: [2023-03-17 00:10:41,800] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 6: [2023-03-17 00:10:41,800] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 6: [2023-03-17 00:10:41,800] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 6: [2023-03-17 00:10:41,800] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 6: [2023-03-17 00:10:41,800] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 2: [2023-03-17 00:10:41,810] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:10:41,810] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:10:41,810] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:10:41,810] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:10:41,810] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:10:41,810] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:10:41,810] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:10:41,810] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2023-03-17 00:10:41,810] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 00:10:41,810] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 00:10:41,810] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2023-03-17 00:10:41,810] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-03-17 00:10:41,810] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-03-17 00:10:41,810] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-03-17 00:10:41,811] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 2: [2023-03-17 00:10:41,811] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 2: [2023-03-17 00:10:41,811] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 2: [2023-03-17 00:10:41,811] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 2: [2023-03-17 00:10:41,811] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 2: [2023-03-17 00:10:41,811] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 2: [2023-03-17 00:10:41,811] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 2: [2023-03-17 00:10:41,812] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:10:41,812] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step50000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-03-17 00:10:41,812] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 0: successfully saved checkpoint at iteration 50000 to checkpoints_146m60b100m 7: time (ms) | save-checkpoint: 446.55 7: iteration 50100/ 115203 | consumed samples: 12825600 | consumed tokens: 26266828800 | elapsed time per iteration (s): 0.38 | learning rate: 1.299E-04 | global batch size: 256 | lm loss: 3.058844E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 669.328 | TFLOPs: 31.24 | 7: iteration 50200/ 115203 | consumed samples: 12851200 | consumed tokens: 26319257600 | elapsed time per iteration (s): 0.38 | learning rate: 1.296E-04 | global batch size: 256 | lm loss: 3.053939E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 675.128 | TFLOPs: 31.51 | 7: iteration 50300/ 115203 | consumed samples: 12876800 | consumed tokens: 26371686400 | elapsed time per iteration (s): 0.38 | learning rate: 1.294E-04 | global batch size: 256 | lm loss: 3.058287E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 680.764 | TFLOPs: 31.78 | 7: iteration 50400/ 115203 | consumed samples: 12902400 | consumed tokens: 26424115200 | elapsed time per iteration (s): 0.38 | learning rate: 1.291E-04 | global batch size: 256 | lm loss: 3.056646E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 677.314 | TFLOPs: 31.61 | 7: iteration 50500/ 115203 | consumed samples: 12928000 | consumed tokens: 26476544000 | elapsed time per iteration (s): 0.38 | learning rate: 1.289E-04 | global batch size: 256 | lm loss: 3.051196E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 680.037 | TFLOPs: 31.74 | 7: iteration 50600/ 115203 | consumed samples: 12953600 | consumed tokens: 26528972800 | elapsed time per iteration (s): 0.38 | learning rate: 1.287E-04 | global batch size: 256 | lm loss: 3.052013E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.760 | TFLOPs: 31.82 | 7: iteration 50700/ 115203 | consumed samples: 12979200 | consumed tokens: 26581401600 | elapsed time per iteration (s): 0.38 | learning rate: 1.284E-04 | global batch size: 256 | lm loss: 3.052039E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.147 | TFLOPs: 31.79 | 7: iteration 50800/ 115203 | consumed samples: 13004800 | consumed tokens: 26633830400 | elapsed time per iteration (s): 0.38 | learning rate: 1.282E-04 | global batch size: 256 | lm loss: 3.058548E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 680.019 | TFLOPs: 31.74 | 7: iteration 50900/ 115203 | consumed samples: 13030400 | consumed tokens: 26686259200 | elapsed time per iteration (s): 0.38 | learning rate: 1.279E-04 | global batch size: 256 | lm loss: 3.051282E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.957 | TFLOPs: 31.83 | 7: iteration 51000/ 115203 | consumed samples: 13056000 | consumed tokens: 26738688000 | elapsed time per iteration (s): 0.38 | learning rate: 1.277E-04 | global batch size: 256 | lm loss: 3.052161E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.658 | TFLOPs: 31.82 | 7: iteration 51100/ 115203 | consumed samples: 13081600 | consumed tokens: 26791116800 | elapsed time per iteration (s): 0.38 | learning rate: 1.274E-04 | global batch size: 256 | lm loss: 3.054562E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.490 | TFLOPs: 31.86 | 7: iteration 51200/ 115203 | consumed samples: 13107200 | consumed tokens: 26843545600 | elapsed time per iteration (s): 0.38 | learning rate: 1.272E-04 | global batch size: 256 | lm loss: 3.051221E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.317 | TFLOPs: 31.85 | 7: iteration 51300/ 115203 | consumed samples: 13132800 | consumed tokens: 26895974400 | elapsed time per iteration (s): 0.38 | learning rate: 1.269E-04 | global batch size: 256 | lm loss: 3.047734E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 677.890 | TFLOPs: 31.64 | 7: iteration 51400/ 115203 | consumed samples: 13158400 | consumed tokens: 26948403200 | elapsed time per iteration (s): 0.38 | learning rate: 1.267E-04 | global batch size: 256 | lm loss: 3.051096E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 678.148 | TFLOPs: 31.65 | 7: iteration 51500/ 115203 | consumed samples: 13184000 | consumed tokens: 27000832000 | elapsed time per iteration (s): 0.38 | learning rate: 1.265E-04 | global batch size: 256 | lm loss: 3.049460E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 673.340 | TFLOPs: 31.43 | 7: iteration 51600/ 115203 | consumed samples: 13209600 | consumed tokens: 27053260800 | elapsed time per iteration (s): 0.38 | learning rate: 1.262E-04 | global batch size: 256 | lm loss: 3.050757E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 674.428 | TFLOPs: 31.48 | 7: iteration 51700/ 115203 | consumed samples: 13235200 | consumed tokens: 27105689600 | elapsed time per iteration (s): 0.38 | learning rate: 1.260E-04 | global batch size: 256 | lm loss: 3.054196E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 673.271 | TFLOPs: 31.43 | 7: iteration 51800/ 115203 | consumed samples: 13260800 | consumed tokens: 27158118400 | elapsed time per iteration (s): 0.38 | learning rate: 1.257E-04 | global batch size: 256 | lm loss: 3.054268E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 667.161 | TFLOPs: 31.14 | 7: iteration 51900/ 115203 | consumed samples: 13286400 | consumed tokens: 27210547200 | elapsed time per iteration (s): 0.38 | learning rate: 1.255E-04 | global batch size: 256 | lm loss: 3.051572E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 676.425 | TFLOPs: 31.57 | 0: [2023-03-17 00:23:16,952] [INFO] [logging.py:68:log_dist] [Rank 0] step=52000, skipped=0, lr=[0.00012524180298737348, 0.00012524180298737348, 0.00012524180298737348], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 52000/ 115203 | consumed samples: 13312000 | consumed tokens: 27262976000 | elapsed time per iteration (s): 0.38 | learning rate: 1.252E-04 | global batch size: 256 | lm loss: 3.054145E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 676.216 | TFLOPs: 31.56 | 0: steps: 52000 loss: 3.0471 iter time (s): 0.375 samples/sec: 681.917 7: iteration 52100/ 115203 | consumed samples: 13337600 | consumed tokens: 27315404800 | elapsed time per iteration (s): 0.38 | learning rate: 1.250E-04 | global batch size: 256 | lm loss: 3.052498E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 675.864 | TFLOPs: 31.55 | 7: iteration 52200/ 115203 | consumed samples: 13363200 | consumed tokens: 27367833600 | elapsed time per iteration (s): 0.38 | learning rate: 1.248E-04 | global batch size: 256 | lm loss: 3.044258E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.124 | TFLOPs: 31.79 | 7: iteration 52300/ 115203 | consumed samples: 13388800 | consumed tokens: 27420262400 | elapsed time per iteration (s): 0.38 | learning rate: 1.245E-04 | global batch size: 256 | lm loss: 3.046811E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 680.710 | TFLOPs: 31.77 | 7: iteration 52400/ 115203 | consumed samples: 13414400 | consumed tokens: 27472691200 | elapsed time per iteration (s): 0.38 | learning rate: 1.243E-04 | global batch size: 256 | lm loss: 3.049902E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.614 | TFLOPs: 31.82 | 7: iteration 52500/ 115203 | consumed samples: 13440000 | consumed tokens: 27525120000 | elapsed time per iteration (s): 0.38 | learning rate: 1.240E-04 | global batch size: 256 | lm loss: 3.050231E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 679.565 | TFLOPs: 31.72 | 7: iteration 52600/ 115203 | consumed samples: 13465600 | consumed tokens: 27577548800 | elapsed time per iteration (s): 0.37 | learning rate: 1.238E-04 | global batch size: 256 | lm loss: 3.048914E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.817 | TFLOPs: 31.87 | 7: iteration 52700/ 115203 | consumed samples: 13491200 | consumed tokens: 27629977600 | elapsed time per iteration (s): 0.38 | learning rate: 1.235E-04 | global batch size: 256 | lm loss: 3.048055E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.962 | TFLOPs: 31.83 | 7: iteration 52800/ 115203 | consumed samples: 13516800 | consumed tokens: 27682406400 | elapsed time per iteration (s): 0.38 | learning rate: 1.233E-04 | global batch size: 256 | lm loss: 3.045393E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.280 | TFLOPs: 31.85 | 7: iteration 52900/ 115203 | consumed samples: 13542400 | consumed tokens: 27734835200 | elapsed time per iteration (s): 0.38 | learning rate: 1.230E-04 | global batch size: 256 | lm loss: 3.045984E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.252 | TFLOPs: 31.84 | 7: iteration 53000/ 115203 | consumed samples: 13568000 | consumed tokens: 27787264000 | elapsed time per iteration (s): 0.38 | learning rate: 1.228E-04 | global batch size: 256 | lm loss: 3.047330E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 678.926 | TFLOPs: 31.69 | 7: iteration 53100/ 115203 | consumed samples: 13593600 | consumed tokens: 27839692800 | elapsed time per iteration (s): 0.38 | learning rate: 1.225E-04 | global batch size: 256 | lm loss: 3.042435E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 675.202 | TFLOPs: 31.52 | 7: iteration 53200/ 115203 | consumed samples: 13619200 | consumed tokens: 27892121600 | elapsed time per iteration (s): 0.38 | learning rate: 1.223E-04 | global batch size: 256 | lm loss: 3.041798E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 672.787 | TFLOPs: 31.40 | 7: iteration 53300/ 115203 | consumed samples: 13644800 | consumed tokens: 27944550400 | elapsed time per iteration (s): 0.38 | learning rate: 1.221E-04 | global batch size: 256 | lm loss: 3.046139E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 673.176 | TFLOPs: 31.42 | 7: iteration 53400/ 115203 | consumed samples: 13670400 | consumed tokens: 27996979200 | elapsed time per iteration (s): 0.38 | learning rate: 1.218E-04 | global batch size: 256 | lm loss: 3.048138E+00 | grad norm: 0.271 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 675.214 | TFLOPs: 31.52 | 7: iteration 53500/ 115203 | consumed samples: 13696000 | consumed tokens: 28049408000 | elapsed time per iteration (s): 0.38 | learning rate: 1.216E-04 | global batch size: 256 | lm loss: 3.042705E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 678.998 | TFLOPs: 31.69 | 7: iteration 53600/ 115203 | consumed samples: 13721600 | consumed tokens: 28101836800 | elapsed time per iteration (s): 0.38 | learning rate: 1.213E-04 | global batch size: 256 | lm loss: 3.042508E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 680.962 | TFLOPs: 31.78 | 7: iteration 53700/ 115203 | consumed samples: 13747200 | consumed tokens: 28154265600 | elapsed time per iteration (s): 0.38 | learning rate: 1.211E-04 | global batch size: 256 | lm loss: 3.048806E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 680.969 | TFLOPs: 31.79 | 7: iteration 53800/ 115203 | consumed samples: 13772800 | consumed tokens: 28206694400 | elapsed time per iteration (s): 0.38 | learning rate: 1.208E-04 | global batch size: 256 | lm loss: 3.046500E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.418 | TFLOPs: 31.81 | 7: iteration 53900/ 115203 | consumed samples: 13798400 | consumed tokens: 28259123200 | elapsed time per iteration (s): 0.38 | learning rate: 1.206E-04 | global batch size: 256 | lm loss: 3.046689E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.759 | TFLOPs: 31.82 | 0: [2023-03-17 00:35:50,497] [INFO] [logging.py:68:log_dist] [Rank 0] step=54000, skipped=0, lr=[0.00012033461390561511, 0.00012033461390561511, 0.00012033461390561511], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 54000/ 115203 | consumed samples: 13824000 | consumed tokens: 28311552000 | elapsed time per iteration (s): 0.38 | learning rate: 1.203E-04 | global batch size: 256 | lm loss: 3.046224E+00 | grad norm: 0.271 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.794 | TFLOPs: 31.82 | 0: steps: 54000 loss: 3.0485 iter time (s): 0.375 samples/sec: 683.239 7: iteration 54100/ 115203 | consumed samples: 13849600 | consumed tokens: 28363980800 | elapsed time per iteration (s): 0.38 | learning rate: 1.201E-04 | global batch size: 256 | lm loss: 3.044531E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.160 | TFLOPs: 31.84 | 7: iteration 54200/ 115203 | consumed samples: 13875200 | consumed tokens: 28416409600 | elapsed time per iteration (s): 0.37 | learning rate: 1.198E-04 | global batch size: 256 | lm loss: 3.041349E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.772 | TFLOPs: 31.87 | 7: iteration 54300/ 115203 | consumed samples: 13900800 | consumed tokens: 28468838400 | elapsed time per iteration (s): 0.38 | learning rate: 1.196E-04 | global batch size: 256 | lm loss: 3.039390E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.595 | TFLOPs: 31.86 | 7: iteration 54400/ 115203 | consumed samples: 13926400 | consumed tokens: 28521267200 | elapsed time per iteration (s): 0.38 | learning rate: 1.193E-04 | global batch size: 256 | lm loss: 3.042188E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.529 | TFLOPs: 31.86 | 7: iteration 54500/ 115203 | consumed samples: 13952000 | consumed tokens: 28573696000 | elapsed time per iteration (s): 0.37 | learning rate: 1.191E-04 | global batch size: 256 | lm loss: 3.042621E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.735 | TFLOPs: 31.87 | 7: iteration 54600/ 115203 | consumed samples: 13977600 | consumed tokens: 28626124800 | elapsed time per iteration (s): 0.38 | learning rate: 1.189E-04 | global batch size: 256 | lm loss: 3.045258E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.646 | TFLOPs: 31.86 | 7: iteration 54700/ 115203 | consumed samples: 14003200 | consumed tokens: 28678553600 | elapsed time per iteration (s): 0.38 | learning rate: 1.186E-04 | global batch size: 256 | lm loss: 3.040349E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.492 | TFLOPs: 31.86 | 7: iteration 54800/ 115203 | consumed samples: 14028800 | consumed tokens: 28730982400 | elapsed time per iteration (s): 0.38 | learning rate: 1.184E-04 | global batch size: 256 | lm loss: 3.041004E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.344 | TFLOPs: 31.85 | 7: iteration 54900/ 115203 | consumed samples: 14054400 | consumed tokens: 28783411200 | elapsed time per iteration (s): 0.38 | learning rate: 1.181E-04 | global batch size: 256 | lm loss: 3.041291E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.280 | TFLOPs: 31.85 | 7: iteration 55000/ 115203 | consumed samples: 14080000 | consumed tokens: 28835840000 | elapsed time per iteration (s): 0.38 | learning rate: 1.179E-04 | global batch size: 256 | lm loss: 3.039446E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.575 | TFLOPs: 31.86 | 7: iteration 55100/ 115203 | consumed samples: 14105600 | consumed tokens: 28888268800 | elapsed time per iteration (s): 0.37 | learning rate: 1.176E-04 | global batch size: 256 | lm loss: 3.040291E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.675 | TFLOPs: 31.86 | 7: iteration 55200/ 115203 | consumed samples: 14131200 | consumed tokens: 28940697600 | elapsed time per iteration (s): 0.38 | learning rate: 1.174E-04 | global batch size: 256 | lm loss: 3.039194E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.506 | TFLOPs: 31.86 | 7: iteration 55300/ 115203 | consumed samples: 14156800 | consumed tokens: 28993126400 | elapsed time per iteration (s): 0.37 | learning rate: 1.171E-04 | global batch size: 256 | lm loss: 3.036566E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 683.017 | TFLOPs: 31.88 | 7: iteration 55400/ 115203 | consumed samples: 14182400 | consumed tokens: 29045555200 | elapsed time per iteration (s): 0.37 | learning rate: 1.169E-04 | global batch size: 256 | lm loss: 3.035428E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 683.580 | TFLOPs: 31.91 | 7: iteration 55500/ 115203 | consumed samples: 14208000 | consumed tokens: 29097984000 | elapsed time per iteration (s): 0.37 | learning rate: 1.166E-04 | global batch size: 256 | lm loss: 3.042014E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 684.087 | TFLOPs: 31.93 | 7: iteration 55600/ 115203 | consumed samples: 14233600 | consumed tokens: 29150412800 | elapsed time per iteration (s): 0.37 | learning rate: 1.164E-04 | global batch size: 256 | lm loss: 3.036330E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 684.106 | TFLOPs: 31.93 | 7: iteration 55700/ 115203 | consumed samples: 14259200 | consumed tokens: 29202841600 | elapsed time per iteration (s): 0.37 | learning rate: 1.161E-04 | global batch size: 256 | lm loss: 3.035648E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 684.353 | TFLOPs: 31.94 | 7: iteration 55800/ 115203 | consumed samples: 14284800 | consumed tokens: 29255270400 | elapsed time per iteration (s): 0.37 | learning rate: 1.159E-04 | global batch size: 256 | lm loss: 3.040992E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 684.226 | TFLOPs: 31.94 | 7: iteration 55900/ 115203 | consumed samples: 14310400 | consumed tokens: 29307699200 | elapsed time per iteration (s): 0.37 | learning rate: 1.156E-04 | global batch size: 256 | lm loss: 3.033125E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 684.021 | TFLOPs: 31.93 | 0: [2023-03-17 00:48:20,051] [INFO] [logging.py:68:log_dist] [Rank 0] step=56000, skipped=0, lr=[0.00011539606744822729, 0.00011539606744822729, 0.00011539606744822729], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 56000/ 115203 | consumed samples: 14336000 | consumed tokens: 29360128000 | elapsed time per iteration (s): 0.37 | learning rate: 1.154E-04 | global batch size: 256 | lm loss: 3.032571E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 683.784 | TFLOPs: 31.92 | 0: steps: 56000 loss: 3.0591 iter time (s): 0.373 samples/sec: 686.783 7: iteration 56100/ 115203 | consumed samples: 14361600 | consumed tokens: 29412556800 | elapsed time per iteration (s): 0.37 | learning rate: 1.151E-04 | global batch size: 256 | lm loss: 3.040749E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 683.152 | TFLOPs: 31.89 | 7: iteration 56200/ 115203 | consumed samples: 14387200 | consumed tokens: 29464985600 | elapsed time per iteration (s): 0.37 | learning rate: 1.149E-04 | global batch size: 256 | lm loss: 3.037292E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 683.150 | TFLOPs: 31.89 | 7: iteration 56300/ 115203 | consumed samples: 14412800 | consumed tokens: 29517414400 | elapsed time per iteration (s): 0.38 | learning rate: 1.147E-04 | global batch size: 256 | lm loss: 3.036797E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.502 | TFLOPs: 31.86 | 7: iteration 56400/ 115203 | consumed samples: 14438400 | consumed tokens: 29569843200 | elapsed time per iteration (s): 0.38 | learning rate: 1.144E-04 | global batch size: 256 | lm loss: 3.036779E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.460 | TFLOPs: 31.85 | 7: iteration 56500/ 115203 | consumed samples: 14464000 | consumed tokens: 29622272000 | elapsed time per iteration (s): 0.38 | learning rate: 1.142E-04 | global batch size: 256 | lm loss: 3.035602E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.559 | TFLOPs: 31.86 | 7: iteration 56600/ 115203 | consumed samples: 14489600 | consumed tokens: 29674700800 | elapsed time per iteration (s): 0.38 | learning rate: 1.139E-04 | global batch size: 256 | lm loss: 3.038798E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.237 | TFLOPs: 31.84 | 7: iteration 56700/ 115203 | consumed samples: 14515200 | consumed tokens: 29727129600 | elapsed time per iteration (s): 0.38 | learning rate: 1.137E-04 | global batch size: 256 | lm loss: 3.037284E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.158 | TFLOPs: 31.84 | 7: iteration 56800/ 115203 | consumed samples: 14540800 | consumed tokens: 29779558400 | elapsed time per iteration (s): 0.38 | learning rate: 1.134E-04 | global batch size: 256 | lm loss: 3.033983E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.418 | TFLOPs: 31.81 | 7: iteration 56900/ 115203 | consumed samples: 14566400 | consumed tokens: 29831987200 | elapsed time per iteration (s): 0.38 | learning rate: 1.132E-04 | global batch size: 256 | lm loss: 3.037655E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.185 | TFLOPs: 31.84 | 7: iteration 57000/ 115203 | consumed samples: 14592000 | consumed tokens: 29884416000 | elapsed time per iteration (s): 0.38 | learning rate: 1.129E-04 | global batch size: 256 | lm loss: 3.036363E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.155 | TFLOPs: 31.84 | 7: iteration 57100/ 115203 | consumed samples: 14617600 | consumed tokens: 29936844800 | elapsed time per iteration (s): 0.38 | learning rate: 1.127E-04 | global batch size: 256 | lm loss: 3.034665E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.048 | TFLOPs: 31.84 | 7: iteration 57200/ 115203 | consumed samples: 14643200 | consumed tokens: 29989273600 | elapsed time per iteration (s): 0.38 | learning rate: 1.124E-04 | global batch size: 256 | lm loss: 3.036707E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 680.955 | TFLOPs: 31.78 | 7: iteration 57300/ 115203 | consumed samples: 14668800 | consumed tokens: 30041702400 | elapsed time per iteration (s): 0.38 | learning rate: 1.122E-04 | global batch size: 256 | lm loss: 3.033232E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 680.799 | TFLOPs: 31.78 | 7: iteration 57400/ 115203 | consumed samples: 14694400 | consumed tokens: 30094131200 | elapsed time per iteration (s): 0.38 | learning rate: 1.119E-04 | global batch size: 256 | lm loss: 3.036060E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.721 | TFLOPs: 31.82 | 7: iteration 57500/ 115203 | consumed samples: 14720000 | consumed tokens: 30146560000 | elapsed time per iteration (s): 0.38 | learning rate: 1.117E-04 | global batch size: 256 | lm loss: 3.029893E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.708 | TFLOPs: 31.82 | 7: iteration 57600/ 115203 | consumed samples: 14745600 | consumed tokens: 30198988800 | elapsed time per iteration (s): 0.38 | learning rate: 1.114E-04 | global batch size: 256 | lm loss: 3.033259E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.944 | TFLOPs: 31.83 | 7: iteration 57700/ 115203 | consumed samples: 14771200 | consumed tokens: 30251417600 | elapsed time per iteration (s): 0.38 | learning rate: 1.112E-04 | global batch size: 256 | lm loss: 3.028911E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.986 | TFLOPs: 31.83 | 7: iteration 57800/ 115203 | consumed samples: 14796800 | consumed tokens: 30303846400 | elapsed time per iteration (s): 0.38 | learning rate: 1.109E-04 | global batch size: 256 | lm loss: 3.029567E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.419 | TFLOPs: 31.81 | 7: iteration 57900/ 115203 | consumed samples: 14822400 | consumed tokens: 30356275200 | elapsed time per iteration (s): 0.38 | learning rate: 1.107E-04 | global batch size: 256 | lm loss: 3.034067E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.451 | TFLOPs: 31.81 | 0: [2023-03-17 01:00:50,809] [INFO] [logging.py:68:log_dist] [Rank 0] step=58000, skipped=0, lr=[0.00011044114819593482, 0.00011044114819593482, 0.00011044114819593482], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 58000/ 115203 | consumed samples: 14848000 | consumed tokens: 30408704000 | elapsed time per iteration (s): 0.38 | learning rate: 1.104E-04 | global batch size: 256 | lm loss: 3.027828E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.541 | TFLOPs: 31.81 | 0: steps: 58000 loss: 3.0050 iter time (s): 0.373 samples/sec: 685.679 7: iteration 58100/ 115203 | consumed samples: 14873600 | consumed tokens: 30461132800 | elapsed time per iteration (s): 0.38 | learning rate: 1.102E-04 | global batch size: 256 | lm loss: 3.032242E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.358 | TFLOPs: 31.80 | 7: iteration 58200/ 115203 | consumed samples: 14899200 | consumed tokens: 30513561600 | elapsed time per iteration (s): 0.38 | learning rate: 1.099E-04 | global batch size: 256 | lm loss: 3.031001E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.821 | TFLOPs: 31.82 | 7: iteration 58300/ 115203 | consumed samples: 14924800 | consumed tokens: 30565990400 | elapsed time per iteration (s): 0.38 | learning rate: 1.097E-04 | global batch size: 256 | lm loss: 3.033054E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.367 | TFLOPs: 31.80 | 7: iteration 58400/ 115203 | consumed samples: 14950400 | consumed tokens: 30618419200 | elapsed time per iteration (s): 0.38 | learning rate: 1.094E-04 | global batch size: 256 | lm loss: 3.030525E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 680.379 | TFLOPs: 31.76 | 7: iteration 58500/ 115203 | consumed samples: 14976000 | consumed tokens: 30670848000 | elapsed time per iteration (s): 0.38 | learning rate: 1.092E-04 | global batch size: 256 | lm loss: 3.031536E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.328 | TFLOPs: 31.80 | 7: iteration 58600/ 115203 | consumed samples: 15001600 | consumed tokens: 30723276800 | elapsed time per iteration (s): 0.38 | learning rate: 1.090E-04 | global batch size: 256 | lm loss: 3.032050E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 679.358 | TFLOPs: 31.71 | 7: iteration 58700/ 115203 | consumed samples: 15027200 | consumed tokens: 30775705600 | elapsed time per iteration (s): 0.38 | learning rate: 1.087E-04 | global batch size: 256 | lm loss: 3.031414E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.570 | TFLOPs: 31.81 | 7: iteration 58800/ 115203 | consumed samples: 15052800 | consumed tokens: 30828134400 | elapsed time per iteration (s): 0.38 | learning rate: 1.085E-04 | global batch size: 256 | lm loss: 3.027497E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 680.459 | TFLOPs: 31.76 | 7: iteration 58900/ 115203 | consumed samples: 15078400 | consumed tokens: 30880563200 | elapsed time per iteration (s): 0.38 | learning rate: 1.082E-04 | global batch size: 256 | lm loss: 3.031089E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.671 | TFLOPs: 31.82 | 7: iteration 59000/ 115203 | consumed samples: 15104000 | consumed tokens: 30932992000 | elapsed time per iteration (s): 0.38 | learning rate: 1.080E-04 | global batch size: 256 | lm loss: 3.029467E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 678.571 | TFLOPs: 31.67 | 7: iteration 59100/ 115203 | consumed samples: 15129600 | consumed tokens: 30985420800 | elapsed time per iteration (s): 0.39 | learning rate: 1.077E-04 | global batch size: 256 | lm loss: 3.028705E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 662.245 | TFLOPs: 30.91 | 7: iteration 59200/ 115203 | consumed samples: 15155200 | consumed tokens: 31037849600 | elapsed time per iteration (s): 0.38 | learning rate: 1.075E-04 | global batch size: 256 | lm loss: 3.027512E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 670.481 | TFLOPs: 31.30 | 7: iteration 59300/ 115203 | consumed samples: 15180800 | consumed tokens: 31090278400 | elapsed time per iteration (s): 0.38 | learning rate: 1.072E-04 | global batch size: 256 | lm loss: 3.028266E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 669.713 | TFLOPs: 31.26 | 7: iteration 59400/ 115203 | consumed samples: 15206400 | consumed tokens: 31142707200 | elapsed time per iteration (s): 0.38 | learning rate: 1.070E-04 | global batch size: 256 | lm loss: 3.026590E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 668.839 | TFLOPs: 31.22 | 7: iteration 59500/ 115203 | consumed samples: 15232000 | consumed tokens: 31195136000 | elapsed time per iteration (s): 0.38 | learning rate: 1.067E-04 | global batch size: 256 | lm loss: 3.025228E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 666.944 | TFLOPs: 31.13 | 7: iteration 59600/ 115203 | consumed samples: 15257600 | consumed tokens: 31247564800 | elapsed time per iteration (s): 0.38 | learning rate: 1.065E-04 | global batch size: 256 | lm loss: 3.025771E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 667.853 | TFLOPs: 31.17 | 7: iteration 59700/ 115203 | consumed samples: 15283200 | consumed tokens: 31299993600 | elapsed time per iteration (s): 0.38 | learning rate: 1.062E-04 | global batch size: 256 | lm loss: 3.028944E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 666.137 | TFLOPs: 31.09 | 7: iteration 59800/ 115203 | consumed samples: 15308800 | consumed tokens: 31352422400 | elapsed time per iteration (s): 0.38 | learning rate: 1.060E-04 | global batch size: 256 | lm loss: 3.029201E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 669.501 | TFLOPs: 31.25 | 7: iteration 59900/ 115203 | consumed samples: 15334400 | consumed tokens: 31404851200 | elapsed time per iteration (s): 0.38 | learning rate: 1.057E-04 | global batch size: 256 | lm loss: 3.028484E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 668.095 | TFLOPs: 31.18 | 0: [2023-03-17 01:13:29,980] [INFO] [logging.py:68:log_dist] [Rank 0] step=60000, skipped=0, lr=[0.00010548489040793946, 0.00010548489040793946, 0.00010548489040793946], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 60000/ 115203 | consumed samples: 15360000 | consumed tokens: 31457280000 | elapsed time per iteration (s): 0.38 | learning rate: 1.055E-04 | global batch size: 256 | lm loss: 3.028327E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 672.008 | TFLOPs: 31.37 | 0: steps: 60000 loss: 3.0484 iter time (s): 0.377 samples/sec: 678.227 7: ------------------------------------------------------------------------------------------------ 7: validation loss at iteration 60000 | lm loss value: 3.845560E+00 | lm loss PPL: 4.678488E+01 | 7: ------------------------------------------------------------------------------------------------ 0: saving checkpoint at iteration 60000 to checkpoints_146m60b100m 0: [2023-03-17 01:13:30,108] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step60000 is begin to save! 0: [2023-03-17 01:13:30,114] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step60000/layer_01-model_00-model_states.pt... 0: [2023-03-17 01:13:30,215] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step60000/layer_01-model_00-model_states.pt. 0: [2023-03-17 01:13:30,215] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step60000/layer_03-model_00-model_states.pt... 0: [2023-03-17 01:13:30,232] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step60000/layer_03-model_00-model_states.pt. 0: [2023-03-17 01:13:30,232] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step60000/layer_04-model_00-model_states.pt... 0: [2023-03-17 01:13:30,247] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step60000/layer_04-model_00-model_states.pt. 0: [2023-03-17 01:13:30,247] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step60000/layer_05-model_00-model_states.pt... 0: [2023-03-17 01:13:30,262] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step60000/layer_05-model_00-model_states.pt. 0: [2023-03-17 01:13:30,263] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step60000/layer_06-model_00-model_states.pt... 0: [2023-03-17 01:13:30,278] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step60000/layer_06-model_00-model_states.pt. 0: [2023-03-17 01:13:30,278] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step60000/layer_07-model_00-model_states.pt... 0: [2023-03-17 01:13:30,293] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step60000/layer_07-model_00-model_states.pt. 0: [2023-03-17 01:13:30,293] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step60000/layer_08-model_00-model_states.pt... 0: [2023-03-17 01:13:30,308] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step60000/layer_08-model_00-model_states.pt. 0: [2023-03-17 01:13:30,308] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step60000/layer_09-model_00-model_states.pt... 0: [2023-03-17 01:13:30,323] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step60000/layer_09-model_00-model_states.pt. 0: [2023-03-17 01:13:30,323] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step60000/layer_10-model_00-model_states.pt... 0: [2023-03-17 01:13:30,338] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step60000/layer_10-model_00-model_states.pt. 0: [2023-03-17 01:13:30,338] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step60000/layer_11-model_00-model_states.pt... 0: [2023-03-17 01:13:30,353] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step60000/layer_11-model_00-model_states.pt. 0: [2023-03-17 01:13:30,353] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step60000/layer_12-model_00-model_states.pt... 0: [2023-03-17 01:13:30,368] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step60000/layer_12-model_00-model_states.pt. 0: [2023-03-17 01:13:30,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step60000/layer_13-model_00-model_states.pt... 0: [2023-03-17 01:13:30,383] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step60000/layer_13-model_00-model_states.pt. 0: [2023-03-17 01:13:30,383] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step60000/layer_14-model_00-model_states.pt... 0: [2023-03-17 01:13:30,398] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step60000/layer_14-model_00-model_states.pt. 0: [2023-03-17 01:13:30,398] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step60000/layer_15-model_00-model_states.pt... 0: [2023-03-17 01:13:30,413] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step60000/layer_15-model_00-model_states.pt. 0: [2023-03-17 01:13:30,413] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step60000/layer_16-model_00-model_states.pt... 0: [2023-03-17 01:13:30,428] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step60000/layer_16-model_00-model_states.pt. 0: [2023-03-17 01:13:30,428] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step60000/layer_17-model_00-model_states.pt... 0: [2023-03-17 01:13:30,443] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step60000/layer_17-model_00-model_states.pt. 0: [2023-03-17 01:13:30,443] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step60000/layer_19-model_00-model_states.pt... 0: [2023-03-17 01:13:30,444] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step60000/layer_19-model_00-model_states.pt. 0: [2023-03-17 01:13:30,445] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_146m60b100m/global_step60000/mp_rank_00_model_states.pt 0: [2023-03-17 01:13:30,445] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step60000/mp_rank_00_model_states.pt... 0: [2023-03-17 01:13:30,449] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step60000/mp_rank_00_model_states.pt. 0: [2023-03-17 01:13:30,466] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:13:30,466] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:13:30,466] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:13:30,466] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:13:30,466] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:13:30,466] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:13:30,466] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:13:30,466] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:13:30,466] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:13:30,466] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:13:30,466] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:13:30,466] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:13:30,466] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:13:30,466] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:13:30,466] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:13:30,466] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:13:30,466] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:13:30,466] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:13:30,466] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:13:30,466] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:13:30,466] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:13:30,466] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:13:30,466] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:13:30,466] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:13:30,466] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:13:30,466] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:13:30,466] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:13:30,466] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:13:30,466] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:13:30,466] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:13:30,466] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:13:30,466] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:13:30,466] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:13:30,466] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:13:30,466] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:13:30,466] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:13:30,466] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:13:30,466] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:13:30,466] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:13:30,466] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:13:30,466] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:13:30,466] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:13:30,466] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:13:30,466] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:13:30,466] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:13:30,466] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:13:30,466] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:13:30,466] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:13:30,466] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:13:30,466] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:13:30,466] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:13:30,466] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:13:30,466] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:13:30,466] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:13:30,466] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:13:30,466] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:13:30,466] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:13:30,466] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:13:30,466] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:13:30,466] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:13:30,466] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:13:30,466] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:13:30,466] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:13:30,466] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:13:30,503] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:13:30,503] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:13:30,503] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:13:30,503] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-17 01:13:30,503] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 0: [2023-03-17 01:13:30,503] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-03-17 01:13:30,503] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 0: [2023-03-17 01:13:30,507] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:13:30,507] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:13:30,507] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-03-17 01:13:30,507] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-17 01:13:30,507] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 0: [2023-03-17 01:13:30,507] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 0: [2023-03-17 01:13:30,508] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:13:30,508] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-03-17 01:13:30,508] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 0: [2023-03-17 01:13:30,508] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:13:30,508] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-17 01:13:30,508] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 1: [2023-03-17 01:13:30,512] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:13:30,512] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:13:30,512] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:13:30,512] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:13:30,512] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 01:13:30,512] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-03-17 01:13:30,512] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:13:30,512] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 1: [2023-03-17 01:13:30,512] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 1: [2023-03-17 01:13:30,512] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-03-17 01:13:30,512] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2023-03-17 01:13:30,512] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-17 01:13:30,512] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 1: [2023-03-17 01:13:30,512] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 1: [2023-03-17 01:13:30,512] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:13:30,512] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 1: [2023-03-17 01:13:30,512] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-03-17 01:13:30,512] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 1: [2023-03-17 01:13:30,513] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:13:30,513] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 01:13:30,513] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 1: [2023-03-17 01:13:30,513] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:13:30,513] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2023-03-17 01:13:30,513] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 3: [2023-03-17 01:13:30,514] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:13:30,514] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-17 01:13:30,514] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 3: [2023-03-17 01:13:30,515] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:13:30,515] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-03-17 01:13:30,515] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 3: [2023-03-17 01:13:30,515] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:13:30,515] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-17 01:13:30,515] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 3: [2023-03-17 01:13:30,515] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:13:30,515] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:13:30,515] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-03-17 01:13:30,515] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2023-03-17 01:13:30,515] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 3: [2023-03-17 01:13:30,515] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 3: [2023-03-17 01:13:30,515] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:13:30,515] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:13:30,515] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:13:30,515] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2023-03-17 01:13:30,515] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-17 01:13:30,515] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 01:13:30,515] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 3: [2023-03-17 01:13:30,515] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 3: [2023-03-17 01:13:30,515] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 0: [2023-03-17 01:13:30,518] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:13:30,518] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-03-17 01:13:30,518] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 0: [2023-03-17 01:13:30,533] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 01:13:30,533] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 4: [2023-03-17 01:13:30,537] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:13:30,537] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:13:30,537] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:13:30,537] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:13:30,537] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:13:30,537] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:13:30,537] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:13:30,537] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:13:30,538] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-03-17 01:13:30,538] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 01:13:30,538] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2023-03-17 01:13:30,538] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-03-17 01:13:30,538] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 01:13:30,538] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 01:13:30,538] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 01:13:30,538] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2023-03-17 01:13:30,538] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 4: [2023-03-17 01:13:30,538] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 4: [2023-03-17 01:13:30,538] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 4: [2023-03-17 01:13:30,538] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 4: [2023-03-17 01:13:30,538] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 4: [2023-03-17 01:13:30,538] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 4: [2023-03-17 01:13:30,538] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 4: [2023-03-17 01:13:30,538] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 7: [2023-03-17 01:13:30,538] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:13:30,538] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:13:30,538] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:13:30,538] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:13:30,538] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:13:30,538] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:13:30,538] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:13:30,538] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:13:30,538] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-03-17 01:13:30,538] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 01:13:30,538] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 01:13:30,538] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-03-17 01:13:30,538] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 7: [2023-03-17 01:13:30,538] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2023-03-17 01:13:30,538] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 7: [2023-03-17 01:13:30,538] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-03-17 01:13:30,538] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-03-17 01:13:30,538] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-03-17 01:13:30,538] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 7: [2023-03-17 01:13:30,538] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 7: [2023-03-17 01:13:30,538] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 7: [2023-03-17 01:13:30,538] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 7: [2023-03-17 01:13:30,538] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 7: [2023-03-17 01:13:30,538] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 2: [2023-03-17 01:13:30,540] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:13:30,540] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:13:30,540] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:13:30,540] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:13:30,540] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:13:30,540] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:13:30,540] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:13:30,540] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:13:30,540] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2023-03-17 01:13:30,540] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-03-17 01:13:30,540] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 01:13:30,540] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-03-17 01:13:30,540] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-03-17 01:13:30,540] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2023-03-17 01:13:30,540] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-03-17 01:13:30,540] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 01:13:30,541] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 2: [2023-03-17 01:13:30,541] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 2: [2023-03-17 01:13:30,541] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 2: [2023-03-17 01:13:30,541] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 2: [2023-03-17 01:13:30,541] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 2: [2023-03-17 01:13:30,541] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 2: [2023-03-17 01:13:30,541] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 2: [2023-03-17 01:13:30,541] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 6: [2023-03-17 01:13:30,542] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:13:30,542] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:13:30,542] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:13:30,542] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:13:30,542] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:13:30,542] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:13:30,542] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:13:30,542] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:13:30,542] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2023-03-17 01:13:30,542] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 01:13:30,542] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 01:13:30,542] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-03-17 01:13:30,542] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-17 01:13:30,542] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2023-03-17 01:13:30,542] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2023-03-17 01:13:30,542] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-17 01:13:30,542] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 6: [2023-03-17 01:13:30,542] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 6: [2023-03-17 01:13:30,542] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 6: [2023-03-17 01:13:30,542] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 6: [2023-03-17 01:13:30,542] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 6: [2023-03-17 01:13:30,542] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 6: [2023-03-17 01:13:30,542] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 6: [2023-03-17 01:13:30,542] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 5: [2023-03-17 01:13:30,548] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:13:30,548] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:13:30,548] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:13:30,548] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:13:30,548] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:13:30,548] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:13:30,548] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:13:30,548] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:13:30,548] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-03-17 01:13:30,548] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-03-17 01:13:30,548] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-03-17 01:13:30,548] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-17 01:13:30,548] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-03-17 01:13:30,548] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-03-17 01:13:30,548] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 5: [2023-03-17 01:13:30,548] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2023-03-17 01:13:30,548] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step60000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-03-17 01:13:30,548] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 5: [2023-03-17 01:13:30,548] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 5: [2023-03-17 01:13:30,548] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 5: [2023-03-17 01:13:30,548] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 5: [2023-03-17 01:13:30,548] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 5: [2023-03-17 01:13:30,548] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 5: [2023-03-17 01:13:30,548] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 0: successfully saved checkpoint at iteration 60000 to checkpoints_146m60b100m 7: time (ms) | save-checkpoint: 450.06 7: iteration 60100/ 115203 | consumed samples: 15385600 | consumed tokens: 31509708800 | elapsed time per iteration (s): 0.39 | learning rate: 1.052E-04 | global batch size: 256 | lm loss: 3.028603E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 660.703 | TFLOPs: 30.84 | 7: iteration 60200/ 115203 | consumed samples: 15411200 | consumed tokens: 31562137600 | elapsed time per iteration (s): 0.38 | learning rate: 1.050E-04 | global batch size: 256 | lm loss: 3.025972E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 668.282 | TFLOPs: 31.19 | 7: iteration 60300/ 115203 | consumed samples: 15436800 | consumed tokens: 31614566400 | elapsed time per iteration (s): 0.38 | learning rate: 1.047E-04 | global batch size: 256 | lm loss: 3.024474E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 667.878 | TFLOPs: 31.17 | 7: iteration 60400/ 115203 | consumed samples: 15462400 | consumed tokens: 31666995200 | elapsed time per iteration (s): 0.38 | learning rate: 1.045E-04 | global batch size: 256 | lm loss: 3.031328E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 664.941 | TFLOPs: 31.04 | 7: iteration 60500/ 115203 | consumed samples: 15488000 | consumed tokens: 31719424000 | elapsed time per iteration (s): 0.39 | learning rate: 1.042E-04 | global batch size: 256 | lm loss: 3.028842E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 663.274 | TFLOPs: 30.96 | 7: iteration 60600/ 115203 | consumed samples: 15513600 | consumed tokens: 31771852800 | elapsed time per iteration (s): 0.39 | learning rate: 1.040E-04 | global batch size: 256 | lm loss: 3.027006E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 659.409 | TFLOPs: 30.78 | 7: iteration 60700/ 115203 | consumed samples: 15539200 | consumed tokens: 31824281600 | elapsed time per iteration (s): 0.39 | learning rate: 1.038E-04 | global batch size: 256 | lm loss: 3.027518E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 658.053 | TFLOPs: 30.72 | 7: iteration 60800/ 115203 | consumed samples: 15564800 | consumed tokens: 31876710400 | elapsed time per iteration (s): 0.39 | learning rate: 1.035E-04 | global batch size: 256 | lm loss: 3.020249E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 655.812 | TFLOPs: 30.61 | 7: iteration 60900/ 115203 | consumed samples: 15590400 | consumed tokens: 31929139200 | elapsed time per iteration (s): 0.39 | learning rate: 1.033E-04 | global batch size: 256 | lm loss: 3.024144E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 661.136 | TFLOPs: 30.86 | 7: iteration 61000/ 115203 | consumed samples: 15616000 | consumed tokens: 31981568000 | elapsed time per iteration (s): 0.39 | learning rate: 1.030E-04 | global batch size: 256 | lm loss: 3.025744E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 663.980 | TFLOPs: 30.99 | 7: iteration 61100/ 115203 | consumed samples: 15641600 | consumed tokens: 32033996800 | elapsed time per iteration (s): 0.39 | learning rate: 1.028E-04 | global batch size: 256 | lm loss: 3.024025E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 664.127 | TFLOPs: 31.00 | 7: iteration 61200/ 115203 | consumed samples: 15667200 | consumed tokens: 32086425600 | elapsed time per iteration (s): 0.38 | learning rate: 1.025E-04 | global batch size: 256 | lm loss: 3.025080E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 668.987 | TFLOPs: 31.23 | 7: iteration 61300/ 115203 | consumed samples: 15692800 | consumed tokens: 32138854400 | elapsed time per iteration (s): 0.39 | learning rate: 1.023E-04 | global batch size: 256 | lm loss: 3.023904E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 664.324 | TFLOPs: 31.01 | 7: iteration 61400/ 115203 | consumed samples: 15718400 | consumed tokens: 32191283200 | elapsed time per iteration (s): 0.39 | learning rate: 1.020E-04 | global batch size: 256 | lm loss: 3.025539E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 663.829 | TFLOPs: 30.99 | 7: iteration 61500/ 115203 | consumed samples: 15744000 | consumed tokens: 32243712000 | elapsed time per iteration (s): 0.39 | learning rate: 1.018E-04 | global batch size: 256 | lm loss: 3.022549E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 664.696 | TFLOPs: 31.03 | 7: iteration 61600/ 115203 | consumed samples: 15769600 | consumed tokens: 32296140800 | elapsed time per iteration (s): 0.38 | learning rate: 1.015E-04 | global batch size: 256 | lm loss: 3.022973E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 666.172 | TFLOPs: 31.09 | 7: iteration 61700/ 115203 | consumed samples: 15795200 | consumed tokens: 32348569600 | elapsed time per iteration (s): 0.39 | learning rate: 1.013E-04 | global batch size: 256 | lm loss: 3.024802E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 664.887 | TFLOPs: 31.03 | 7: iteration 61800/ 115203 | consumed samples: 15820800 | consumed tokens: 32400998400 | elapsed time per iteration (s): 0.39 | learning rate: 1.010E-04 | global batch size: 256 | lm loss: 3.025818E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 663.035 | TFLOPs: 30.95 | 7: iteration 61900/ 115203 | consumed samples: 15846400 | consumed tokens: 32453427200 | elapsed time per iteration (s): 0.39 | learning rate: 1.008E-04 | global batch size: 256 | lm loss: 3.019257E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 661.501 | TFLOPs: 30.88 | 0: [2023-03-17 01:26:21,673] [INFO] [logging.py:68:log_dist] [Rank 0] step=62000, skipped=0, lr=[0.0001005423324048397, 0.0001005423324048397, 0.0001005423324048397], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 62000/ 115203 | consumed samples: 15872000 | consumed tokens: 32505856000 | elapsed time per iteration (s): 0.39 | learning rate: 1.005E-04 | global batch size: 256 | lm loss: 3.017709E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 664.833 | TFLOPs: 31.03 | 0: steps: 62000 loss: 3.0650 iter time (s): 0.384 samples/sec: 666.591 7: iteration 62100/ 115203 | consumed samples: 15897600 | consumed tokens: 32558284800 | elapsed time per iteration (s): 0.38 | learning rate: 1.003E-04 | global batch size: 256 | lm loss: 3.025167E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 665.272 | TFLOPs: 31.05 | 7: iteration 62200/ 115203 | consumed samples: 15923200 | consumed tokens: 32610713600 | elapsed time per iteration (s): 0.38 | learning rate: 1.000E-04 | global batch size: 256 | lm loss: 3.018351E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 670.390 | TFLOPs: 31.29 | 7: iteration 62300/ 115203 | consumed samples: 15948800 | consumed tokens: 32663142400 | elapsed time per iteration (s): 0.39 | learning rate: 9.980E-05 | global batch size: 256 | lm loss: 3.023346E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 664.036 | TFLOPs: 30.99 | 7: iteration 62400/ 115203 | consumed samples: 15974400 | consumed tokens: 32715571200 | elapsed time per iteration (s): 0.38 | learning rate: 9.956E-05 | global batch size: 256 | lm loss: 3.020514E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 670.017 | TFLOPs: 31.27 | 7: iteration 62500/ 115203 | consumed samples: 16000000 | consumed tokens: 32768000000 | elapsed time per iteration (s): 0.38 | learning rate: 9.931E-05 | global batch size: 256 | lm loss: 3.022338E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 671.828 | TFLOPs: 31.36 | 7: iteration 62600/ 115203 | consumed samples: 16025600 | consumed tokens: 32820428800 | elapsed time per iteration (s): 0.38 | learning rate: 9.906E-05 | global batch size: 256 | lm loss: 3.023372E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 666.798 | TFLOPs: 31.12 | 7: iteration 62700/ 115203 | consumed samples: 16051200 | consumed tokens: 32872857600 | elapsed time per iteration (s): 0.39 | learning rate: 9.882E-05 | global batch size: 256 | lm loss: 3.020893E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 662.787 | TFLOPs: 30.94 | 7: iteration 62800/ 115203 | consumed samples: 16076800 | consumed tokens: 32925286400 | elapsed time per iteration (s): 0.38 | learning rate: 9.857E-05 | global batch size: 256 | lm loss: 3.019265E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 667.317 | TFLOPs: 31.15 | 7: iteration 62900/ 115203 | consumed samples: 16102400 | consumed tokens: 32977715200 | elapsed time per iteration (s): 0.38 | learning rate: 9.833E-05 | global batch size: 256 | lm loss: 3.018860E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 665.086 | TFLOPs: 31.04 | 7: iteration 63000/ 115203 | consumed samples: 16128000 | consumed tokens: 33030144000 | elapsed time per iteration (s): 0.38 | learning rate: 9.808E-05 | global batch size: 256 | lm loss: 3.018956E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 671.617 | TFLOPs: 31.35 | 7: iteration 63100/ 115203 | consumed samples: 16153600 | consumed tokens: 33082572800 | elapsed time per iteration (s): 0.38 | learning rate: 9.784E-05 | global batch size: 256 | lm loss: 3.016261E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 671.142 | TFLOPs: 31.33 | 7: iteration 63200/ 115203 | consumed samples: 16179200 | consumed tokens: 33135001600 | elapsed time per iteration (s): 0.38 | learning rate: 9.759E-05 | global batch size: 256 | lm loss: 3.014599E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 666.498 | TFLOPs: 31.11 | 7: iteration 63300/ 115203 | consumed samples: 16204800 | consumed tokens: 33187430400 | elapsed time per iteration (s): 0.38 | learning rate: 9.734E-05 | global batch size: 256 | lm loss: 3.019594E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 672.569 | TFLOPs: 31.39 | 7: iteration 63400/ 115203 | consumed samples: 16230400 | consumed tokens: 33239859200 | elapsed time per iteration (s): 0.38 | learning rate: 9.710E-05 | global batch size: 256 | lm loss: 3.016293E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 673.276 | TFLOPs: 31.43 | 7: iteration 63500/ 115203 | consumed samples: 16256000 | consumed tokens: 33292288000 | elapsed time per iteration (s): 0.38 | learning rate: 9.685E-05 | global batch size: 256 | lm loss: 3.019498E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 671.885 | TFLOPs: 31.36 | 7: iteration 63600/ 115203 | consumed samples: 16281600 | consumed tokens: 33344716800 | elapsed time per iteration (s): 0.38 | learning rate: 9.661E-05 | global batch size: 256 | lm loss: 3.014578E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 672.890 | TFLOPs: 31.41 | 7: iteration 63700/ 115203 | consumed samples: 16307200 | consumed tokens: 33397145600 | elapsed time per iteration (s): 0.38 | learning rate: 9.636E-05 | global batch size: 256 | lm loss: 3.017637E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 672.950 | TFLOPs: 31.41 | 7: iteration 63800/ 115203 | consumed samples: 16332800 | consumed tokens: 33449574400 | elapsed time per iteration (s): 0.38 | learning rate: 9.612E-05 | global batch size: 256 | lm loss: 3.019787E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 675.080 | TFLOPs: 31.51 | 7: iteration 63900/ 115203 | consumed samples: 16358400 | consumed tokens: 33502003200 | elapsed time per iteration (s): 0.38 | learning rate: 9.587E-05 | global batch size: 256 | lm loss: 3.020275E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 677.412 | TFLOPs: 31.62 | 0: [2023-03-17 01:39:05,719] [INFO] [logging.py:68:log_dist] [Rank 0] step=64000, skipped=0, lr=[9.56284709392273e-05, 9.56284709392273e-05, 9.56284709392273e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 64000/ 115203 | consumed samples: 16384000 | consumed tokens: 33554432000 | elapsed time per iteration (s): 0.38 | learning rate: 9.563E-05 | global batch size: 256 | lm loss: 3.020065E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 673.927 | TFLOPs: 31.46 | 0: steps: 64000 loss: 2.9851 iter time (s): 0.380 samples/sec: 672.807 7: iteration 64100/ 115203 | consumed samples: 16409600 | consumed tokens: 33606860800 | elapsed time per iteration (s): 0.38 | learning rate: 9.538E-05 | global batch size: 256 | lm loss: 3.016848E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 674.058 | TFLOPs: 31.46 | 7: iteration 64200/ 115203 | consumed samples: 16435200 | consumed tokens: 33659289600 | elapsed time per iteration (s): 0.38 | learning rate: 9.514E-05 | global batch size: 256 | lm loss: 3.013710E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 674.639 | TFLOPs: 31.49 | 7: iteration 64300/ 115203 | consumed samples: 16460800 | consumed tokens: 33711718400 | elapsed time per iteration (s): 0.38 | learning rate: 9.489E-05 | global batch size: 256 | lm loss: 3.013763E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 676.055 | TFLOPs: 31.56 | 7: iteration 64400/ 115203 | consumed samples: 16486400 | consumed tokens: 33764147200 | elapsed time per iteration (s): 0.38 | learning rate: 9.465E-05 | global batch size: 256 | lm loss: 3.016596E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 678.579 | TFLOPs: 31.67 | 7: iteration 64500/ 115203 | consumed samples: 16512000 | consumed tokens: 33816576000 | elapsed time per iteration (s): 0.38 | learning rate: 9.441E-05 | global batch size: 256 | lm loss: 3.018826E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 672.109 | TFLOPs: 31.37 | 7: iteration 64600/ 115203 | consumed samples: 16537600 | consumed tokens: 33869004800 | elapsed time per iteration (s): 0.38 | learning rate: 9.416E-05 | global batch size: 256 | lm loss: 3.019543E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 677.421 | TFLOPs: 31.62 | 7: iteration 64700/ 115203 | consumed samples: 16563200 | consumed tokens: 33921433600 | elapsed time per iteration (s): 0.38 | learning rate: 9.392E-05 | global batch size: 256 | lm loss: 3.017658E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 679.695 | TFLOPs: 31.73 | 7: iteration 64800/ 115203 | consumed samples: 16588800 | consumed tokens: 33973862400 | elapsed time per iteration (s): 0.38 | learning rate: 9.367E-05 | global batch size: 256 | lm loss: 3.013934E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 678.335 | TFLOPs: 31.66 | 7: iteration 64900/ 115203 | consumed samples: 16614400 | consumed tokens: 34026291200 | elapsed time per iteration (s): 0.38 | learning rate: 9.343E-05 | global batch size: 256 | lm loss: 3.019464E+00 | grad norm: 0.269 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 680.334 | TFLOPs: 31.76 | 7: iteration 65000/ 115203 | consumed samples: 16640000 | consumed tokens: 34078720000 | elapsed time per iteration (s): 0.38 | learning rate: 9.319E-05 | global batch size: 256 | lm loss: 3.012343E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 678.366 | TFLOPs: 31.66 | 7: iteration 65100/ 115203 | consumed samples: 16665600 | consumed tokens: 34131148800 | elapsed time per iteration (s): 0.38 | learning rate: 9.294E-05 | global batch size: 256 | lm loss: 3.010912E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 680.437 | TFLOPs: 31.76 | 7: iteration 65200/ 115203 | consumed samples: 16691200 | consumed tokens: 34183577600 | elapsed time per iteration (s): 0.38 | learning rate: 9.270E-05 | global batch size: 256 | lm loss: 3.013334E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 677.909 | TFLOPs: 31.64 | 7: iteration 65300/ 115203 | consumed samples: 16716800 | consumed tokens: 34236006400 | elapsed time per iteration (s): 0.38 | learning rate: 9.246E-05 | global batch size: 256 | lm loss: 3.014456E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.416 | TFLOPs: 31.81 | 7: iteration 65400/ 115203 | consumed samples: 16742400 | consumed tokens: 34288435200 | elapsed time per iteration (s): 0.38 | learning rate: 9.221E-05 | global batch size: 256 | lm loss: 3.015936E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 678.495 | TFLOPs: 31.67 | 7: iteration 65500/ 115203 | consumed samples: 16768000 | consumed tokens: 34340864000 | elapsed time per iteration (s): 0.38 | learning rate: 9.197E-05 | global batch size: 256 | lm loss: 3.012766E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 678.147 | TFLOPs: 31.65 | 7: iteration 65600/ 115203 | consumed samples: 16793600 | consumed tokens: 34393292800 | elapsed time per iteration (s): 0.38 | learning rate: 9.173E-05 | global batch size: 256 | lm loss: 3.012928E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 680.678 | TFLOPs: 31.77 | 7: iteration 65700/ 115203 | consumed samples: 16819200 | consumed tokens: 34445721600 | elapsed time per iteration (s): 0.38 | learning rate: 9.149E-05 | global batch size: 256 | lm loss: 3.011373E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 680.900 | TFLOPs: 31.78 | 7: iteration 65800/ 115203 | consumed samples: 16844800 | consumed tokens: 34498150400 | elapsed time per iteration (s): 0.38 | learning rate: 9.124E-05 | global batch size: 256 | lm loss: 3.011943E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.020 | TFLOPs: 31.79 | 7: iteration 65900/ 115203 | consumed samples: 16870400 | consumed tokens: 34550579200 | elapsed time per iteration (s): 0.38 | learning rate: 9.100E-05 | global batch size: 256 | lm loss: 3.015779E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.581 | TFLOPs: 31.81 | 0: [2023-03-17 01:51:40,220] [INFO] [logging.py:68:log_dist] [Rank 0] step=66000, skipped=0, lr=[9.075821569240965e-05, 9.075821569240965e-05, 9.075821569240965e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 66000/ 115203 | consumed samples: 16896000 | consumed tokens: 34603008000 | elapsed time per iteration (s): 0.38 | learning rate: 9.076E-05 | global batch size: 256 | lm loss: 3.010712E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.922 | TFLOPs: 31.83 | 0: steps: 66000 loss: 3.0045 iter time (s): 0.376 samples/sec: 681.589 7: iteration 66100/ 115203 | consumed samples: 16921600 | consumed tokens: 34655436800 | elapsed time per iteration (s): 0.38 | learning rate: 9.052E-05 | global batch size: 256 | lm loss: 3.009379E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.928 | TFLOPs: 31.83 | 7: iteration 66200/ 115203 | consumed samples: 16947200 | consumed tokens: 34707865600 | elapsed time per iteration (s): 0.38 | learning rate: 9.027E-05 | global batch size: 256 | lm loss: 3.008631E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.644 | TFLOPs: 31.82 | 7: iteration 66300/ 115203 | consumed samples: 16972800 | consumed tokens: 34760294400 | elapsed time per iteration (s): 0.38 | learning rate: 9.003E-05 | global batch size: 256 | lm loss: 3.006749E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.217 | TFLOPs: 31.84 | 7: iteration 66400/ 115203 | consumed samples: 16998400 | consumed tokens: 34812723200 | elapsed time per iteration (s): 0.38 | learning rate: 8.979E-05 | global batch size: 256 | lm loss: 3.007591E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 680.548 | TFLOPs: 31.77 | 7: iteration 66500/ 115203 | consumed samples: 17024000 | consumed tokens: 34865152000 | elapsed time per iteration (s): 0.38 | learning rate: 8.955E-05 | global batch size: 256 | lm loss: 3.010804E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.724 | TFLOPs: 31.82 | 7: iteration 66600/ 115203 | consumed samples: 17049600 | consumed tokens: 34917580800 | elapsed time per iteration (s): 0.38 | learning rate: 8.931E-05 | global batch size: 256 | lm loss: 3.013642E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.117 | TFLOPs: 31.84 | 7: iteration 66700/ 115203 | consumed samples: 17075200 | consumed tokens: 34970009600 | elapsed time per iteration (s): 0.38 | learning rate: 8.907E-05 | global batch size: 256 | lm loss: 3.009887E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.998 | TFLOPs: 31.83 | 7: iteration 66800/ 115203 | consumed samples: 17100800 | consumed tokens: 35022438400 | elapsed time per iteration (s): 0.38 | learning rate: 8.883E-05 | global batch size: 256 | lm loss: 3.011122E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.673 | TFLOPs: 31.82 | 7: iteration 66900/ 115203 | consumed samples: 17126400 | consumed tokens: 35074867200 | elapsed time per iteration (s): 0.38 | learning rate: 8.858E-05 | global batch size: 256 | lm loss: 3.011425E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.281 | TFLOPs: 31.85 | 7: iteration 67000/ 115203 | consumed samples: 17152000 | consumed tokens: 35127296000 | elapsed time per iteration (s): 0.38 | learning rate: 8.834E-05 | global batch size: 256 | lm loss: 3.010535E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.898 | TFLOPs: 31.83 | 7: iteration 67100/ 115203 | consumed samples: 17177600 | consumed tokens: 35179724800 | elapsed time per iteration (s): 0.38 | learning rate: 8.810E-05 | global batch size: 256 | lm loss: 3.008127E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.047 | TFLOPs: 31.84 | 7: iteration 67200/ 115203 | consumed samples: 17203200 | consumed tokens: 35232153600 | elapsed time per iteration (s): 0.38 | learning rate: 8.786E-05 | global batch size: 256 | lm loss: 3.010249E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.239 | TFLOPs: 31.84 | 7: iteration 67300/ 115203 | consumed samples: 17228800 | consumed tokens: 35284582400 | elapsed time per iteration (s): 0.38 | learning rate: 8.762E-05 | global batch size: 256 | lm loss: 3.008967E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.461 | TFLOPs: 31.85 | 7: iteration 67400/ 115203 | consumed samples: 17254400 | consumed tokens: 35337011200 | elapsed time per iteration (s): 0.38 | learning rate: 8.738E-05 | global batch size: 256 | lm loss: 3.006199E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.353 | TFLOPs: 31.85 | 7: iteration 67500/ 115203 | consumed samples: 17280000 | consumed tokens: 35389440000 | elapsed time per iteration (s): 0.38 | learning rate: 8.714E-05 | global batch size: 256 | lm loss: 3.006787E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.113 | TFLOPs: 31.84 | 7: iteration 67600/ 115203 | consumed samples: 17305600 | consumed tokens: 35441868800 | elapsed time per iteration (s): 0.38 | learning rate: 8.690E-05 | global batch size: 256 | lm loss: 3.009650E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.370 | TFLOPs: 31.85 | 7: iteration 67700/ 115203 | consumed samples: 17331200 | consumed tokens: 35494297600 | elapsed time per iteration (s): 0.38 | learning rate: 8.666E-05 | global batch size: 256 | lm loss: 3.005845E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.560 | TFLOPs: 31.86 | 7: iteration 67800/ 115203 | consumed samples: 17356800 | consumed tokens: 35546726400 | elapsed time per iteration (s): 0.37 | learning rate: 8.642E-05 | global batch size: 256 | lm loss: 3.005152E+00 | grad norm: 0.270 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 683.275 | TFLOPs: 31.89 | 7: iteration 67900/ 115203 | consumed samples: 17382400 | consumed tokens: 35599155200 | elapsed time per iteration (s): 0.38 | learning rate: 8.619E-05 | global batch size: 256 | lm loss: 3.007433E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.245 | TFLOPs: 31.80 | 0: [2023-03-17 02:04:10,921] [INFO] [logging.py:68:log_dist] [Rank 0] step=68000, skipped=0, lr=[8.594634403532495e-05, 8.594634403532495e-05, 8.594634403532495e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 68000/ 115203 | consumed samples: 17408000 | consumed tokens: 35651584000 | elapsed time per iteration (s): 0.38 | learning rate: 8.595E-05 | global batch size: 256 | lm loss: 3.009903E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.920 | TFLOPs: 31.83 | 0: steps: 68000 loss: 3.0766 iter time (s): 0.374 samples/sec: 685.335 7: iteration 68100/ 115203 | consumed samples: 17433600 | consumed tokens: 35704012800 | elapsed time per iteration (s): 0.38 | learning rate: 8.571E-05 | global batch size: 256 | lm loss: 3.010871E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.136 | TFLOPs: 31.84 | 7: iteration 68200/ 115203 | consumed samples: 17459200 | consumed tokens: 35756441600 | elapsed time per iteration (s): 0.38 | learning rate: 8.547E-05 | global batch size: 256 | lm loss: 3.008124E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.557 | TFLOPs: 31.86 | 7: iteration 68300/ 115203 | consumed samples: 17484800 | consumed tokens: 35808870400 | elapsed time per iteration (s): 0.38 | learning rate: 8.523E-05 | global batch size: 256 | lm loss: 3.007831E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.773 | TFLOPs: 31.82 | 7: iteration 68400/ 115203 | consumed samples: 17510400 | consumed tokens: 35861299200 | elapsed time per iteration (s): 0.38 | learning rate: 8.499E-05 | global batch size: 256 | lm loss: 3.005567E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.634 | TFLOPs: 31.82 | 7: iteration 68500/ 115203 | consumed samples: 17536000 | consumed tokens: 35913728000 | elapsed time per iteration (s): 0.38 | learning rate: 8.475E-05 | global batch size: 256 | lm loss: 3.007859E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.674 | TFLOPs: 31.82 | 7: iteration 68600/ 115203 | consumed samples: 17561600 | consumed tokens: 35966156800 | elapsed time per iteration (s): 0.38 | learning rate: 8.452E-05 | global batch size: 256 | lm loss: 3.004559E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.403 | TFLOPs: 31.85 | 7: iteration 68700/ 115203 | consumed samples: 17587200 | consumed tokens: 36018585600 | elapsed time per iteration (s): 0.37 | learning rate: 8.428E-05 | global batch size: 256 | lm loss: 3.008507E+00 | grad norm: 0.272 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 683.394 | TFLOPs: 31.90 | 7: iteration 68800/ 115203 | consumed samples: 17612800 | consumed tokens: 36071014400 | elapsed time per iteration (s): 0.37 | learning rate: 8.404E-05 | global batch size: 256 | lm loss: 3.006619E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 683.548 | TFLOPs: 31.91 | 7: iteration 68900/ 115203 | consumed samples: 17638400 | consumed tokens: 36123443200 | elapsed time per iteration (s): 0.38 | learning rate: 8.380E-05 | global batch size: 256 | lm loss: 3.006263E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 680.771 | TFLOPs: 31.78 | 7: iteration 69000/ 115203 | consumed samples: 17664000 | consumed tokens: 36175872000 | elapsed time per iteration (s): 0.38 | learning rate: 8.357E-05 | global batch size: 256 | lm loss: 3.005942E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.433 | TFLOPs: 31.81 | 7: iteration 69100/ 115203 | consumed samples: 17689600 | consumed tokens: 36228300800 | elapsed time per iteration (s): 0.38 | learning rate: 8.333E-05 | global batch size: 256 | lm loss: 3.009442E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.070 | TFLOPs: 31.84 | 7: iteration 69200/ 115203 | consumed samples: 17715200 | consumed tokens: 36280729600 | elapsed time per iteration (s): 0.37 | learning rate: 8.309E-05 | global batch size: 256 | lm loss: 3.008139E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.721 | TFLOPs: 31.87 | 7: iteration 69300/ 115203 | consumed samples: 17740800 | consumed tokens: 36333158400 | elapsed time per iteration (s): 0.38 | learning rate: 8.286E-05 | global batch size: 256 | lm loss: 3.009251E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 678.745 | TFLOPs: 31.68 | 7: iteration 69400/ 115203 | consumed samples: 17766400 | consumed tokens: 36385587200 | elapsed time per iteration (s): 0.38 | learning rate: 8.262E-05 | global batch size: 256 | lm loss: 3.000972E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 674.922 | TFLOPs: 31.50 | 7: iteration 69500/ 115203 | consumed samples: 17792000 | consumed tokens: 36438016000 | elapsed time per iteration (s): 0.38 | learning rate: 8.238E-05 | global batch size: 256 | lm loss: 3.004930E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 673.775 | TFLOPs: 31.45 | 7: iteration 69600/ 115203 | consumed samples: 17817600 | consumed tokens: 36490444800 | elapsed time per iteration (s): 0.38 | learning rate: 8.215E-05 | global batch size: 256 | lm loss: 3.003842E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 671.174 | TFLOPs: 31.33 | 7: iteration 69700/ 115203 | consumed samples: 17843200 | consumed tokens: 36542873600 | elapsed time per iteration (s): 0.38 | learning rate: 8.191E-05 | global batch size: 256 | lm loss: 3.009008E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 669.459 | TFLOPs: 31.25 | 7: iteration 69800/ 115203 | consumed samples: 17868800 | consumed tokens: 36595302400 | elapsed time per iteration (s): 0.38 | learning rate: 8.168E-05 | global batch size: 256 | lm loss: 3.004737E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 667.552 | TFLOPs: 31.16 | 7: iteration 69900/ 115203 | consumed samples: 17894400 | consumed tokens: 36647731200 | elapsed time per iteration (s): 0.38 | learning rate: 8.144E-05 | global batch size: 256 | lm loss: 3.006162E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 673.532 | TFLOPs: 31.44 | 0: [2023-03-17 02:16:45,500] [INFO] [logging.py:68:log_dist] [Rank 0] step=70000, skipped=0, lr=[8.120745619091417e-05, 8.120745619091417e-05, 8.120745619091417e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 70000/ 115203 | consumed samples: 17920000 | consumed tokens: 36700160000 | elapsed time per iteration (s): 0.38 | learning rate: 8.121E-05 | global batch size: 256 | lm loss: 3.007398E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 675.771 | TFLOPs: 31.54 | 0: steps: 70000 loss: 2.9867 iter time (s): 0.375 samples/sec: 682.209 7: ------------------------------------------------------------------------------------------------ 7: validation loss at iteration 70000 | lm loss value: 3.869077E+00 | lm loss PPL: 4.789814E+01 | 7: ------------------------------------------------------------------------------------------------ 0: saving checkpoint at iteration 70000 to checkpoints_146m60b100m 0: [2023-03-17 02:16:45,635] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step70000 is begin to save! 0: [2023-03-17 02:16:45,639] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step70000/layer_01-model_00-model_states.pt... 0: [2023-03-17 02:16:45,742] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step70000/layer_01-model_00-model_states.pt. 0: [2023-03-17 02:16:45,742] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step70000/layer_03-model_00-model_states.pt... 0: [2023-03-17 02:16:45,759] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step70000/layer_03-model_00-model_states.pt. 0: [2023-03-17 02:16:45,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step70000/layer_04-model_00-model_states.pt... 0: [2023-03-17 02:16:45,775] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step70000/layer_04-model_00-model_states.pt. 0: [2023-03-17 02:16:45,775] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step70000/layer_05-model_00-model_states.pt... 0: [2023-03-17 02:16:45,790] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step70000/layer_05-model_00-model_states.pt. 0: [2023-03-17 02:16:45,790] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step70000/layer_06-model_00-model_states.pt... 0: [2023-03-17 02:16:45,805] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step70000/layer_06-model_00-model_states.pt. 0: [2023-03-17 02:16:45,806] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step70000/layer_07-model_00-model_states.pt... 0: [2023-03-17 02:16:45,820] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step70000/layer_07-model_00-model_states.pt. 0: [2023-03-17 02:16:45,821] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step70000/layer_08-model_00-model_states.pt... 0: [2023-03-17 02:16:45,835] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step70000/layer_08-model_00-model_states.pt. 0: [2023-03-17 02:16:45,836] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step70000/layer_09-model_00-model_states.pt... 0: [2023-03-17 02:16:45,850] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step70000/layer_09-model_00-model_states.pt. 0: [2023-03-17 02:16:45,851] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step70000/layer_10-model_00-model_states.pt... 0: [2023-03-17 02:16:45,865] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step70000/layer_10-model_00-model_states.pt. 0: [2023-03-17 02:16:45,865] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step70000/layer_11-model_00-model_states.pt... 0: [2023-03-17 02:16:45,880] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step70000/layer_11-model_00-model_states.pt. 0: [2023-03-17 02:16:45,880] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step70000/layer_12-model_00-model_states.pt... 0: [2023-03-17 02:16:45,895] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step70000/layer_12-model_00-model_states.pt. 0: [2023-03-17 02:16:45,895] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step70000/layer_13-model_00-model_states.pt... 0: [2023-03-17 02:16:45,910] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step70000/layer_13-model_00-model_states.pt. 0: [2023-03-17 02:16:45,910] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step70000/layer_14-model_00-model_states.pt... 0: [2023-03-17 02:16:45,925] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step70000/layer_14-model_00-model_states.pt. 0: [2023-03-17 02:16:45,925] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step70000/layer_15-model_00-model_states.pt... 0: [2023-03-17 02:16:45,940] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step70000/layer_15-model_00-model_states.pt. 0: [2023-03-17 02:16:45,940] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step70000/layer_16-model_00-model_states.pt... 0: [2023-03-17 02:16:45,955] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step70000/layer_16-model_00-model_states.pt. 0: [2023-03-17 02:16:45,955] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step70000/layer_17-model_00-model_states.pt... 0: [2023-03-17 02:16:45,970] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step70000/layer_17-model_00-model_states.pt. 0: [2023-03-17 02:16:45,970] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step70000/layer_19-model_00-model_states.pt... 0: [2023-03-17 02:16:45,971] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step70000/layer_19-model_00-model_states.pt. 0: [2023-03-17 02:16:45,972] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_146m60b100m/global_step70000/mp_rank_00_model_states.pt 0: [2023-03-17 02:16:45,972] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step70000/mp_rank_00_model_states.pt... 0: [2023-03-17 02:16:45,976] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step70000/mp_rank_00_model_states.pt. 0: [2023-03-17 02:16:45,993] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:16:45,993] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:16:45,993] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:16:45,993] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:16:45,993] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:16:45,993] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:16:45,993] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:16:45,993] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:16:45,993] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:16:45,993] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:16:45,993] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:16:45,993] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:16:45,993] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:16:45,993] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:16:45,993] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:16:45,993] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:16:45,993] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:16:45,993] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:16:45,993] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:16:45,993] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:16:45,993] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:16:45,993] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:16:45,993] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:16:45,993] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:16:45,993] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:16:45,993] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:16:45,993] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:16:45,993] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:16:45,993] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:16:45,993] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:16:45,993] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:16:45,993] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:16:45,993] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:16:45,993] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:16:45,993] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:16:45,993] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:16:45,993] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:16:45,993] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:16:45,993] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:16:45,993] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:16:45,993] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:16:45,993] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:16:45,993] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:16:45,993] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:16:45,993] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:16:45,993] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:16:45,993] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:16:45,993] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:16:45,993] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:16:45,993] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:16:45,993] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:16:45,993] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:16:45,993] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:16:45,993] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:16:45,993] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:16:45,993] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:16:45,993] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:16:45,993] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:16:45,993] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:16:45,993] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:16:45,993] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:16:45,993] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:16:45,993] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:16:45,993] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:16:46,027] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:16:46,027] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:16:46,027] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-03-17 02:16:46,027] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 0: [2023-03-17 02:16:46,029] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:16:46,030] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-17 02:16:46,030] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 6: [2023-03-17 02:16:46,029] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:16:46,030] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:16:46,029] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 5: [2023-03-17 02:16:46,030] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 6: [2023-03-17 02:16:46,029] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 5: [2023-03-17 02:16:46,030] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 0: [2023-03-17 02:16:46,031] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:16:46,032] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-03-17 02:16:46,032] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 0: [2023-03-17 02:16:46,033] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:16:46,034] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-17 02:16:46,034] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 0: [2023-03-17 02:16:46,034] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:16:46,034] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-03-17 02:16:46,034] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 0: [2023-03-17 02:16:46,034] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:16:46,034] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-03-17 02:16:46,034] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 6: [2023-03-17 02:16:46,035] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:16:46,035] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:16:46,035] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:16:46,035] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:16:46,034] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:16:46,034] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:16:46,034] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:16:46,034] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:16:46,035] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-17 02:16:46,035] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-03-17 02:16:46,035] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 5: [2023-03-17 02:16:46,035] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-03-17 02:16:46,035] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 5: [2023-03-17 02:16:46,035] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2023-03-17 02:16:46,035] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 5: [2023-03-17 02:16:46,035] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 5: [2023-03-17 02:16:46,035] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:16:46,036] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-03-17 02:16:46,036] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 5: [2023-03-17 02:16:46,036] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:16:46,036] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-03-17 02:16:46,036] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 5: [2023-03-17 02:16:46,036] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:16:46,036] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-03-17 02:16:46,036] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 0: [2023-03-17 02:16:46,038] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:16:46,038] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-17 02:16:46,038] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 6: [2023-03-17 02:16:46,035] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:16:46,035] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2023-03-17 02:16:46,035] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-17 02:16:46,035] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 02:16:46,035] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2023-03-17 02:16:46,035] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2023-03-17 02:16:46,035] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:16:46,035] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 6: [2023-03-17 02:16:46,035] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 6: [2023-03-17 02:16:46,035] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 6: [2023-03-17 02:16:46,035] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 6: [2023-03-17 02:16:46,035] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 6: [2023-03-17 02:16:46,035] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-03-17 02:16:46,035] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 6: [2023-03-17 02:16:46,038] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:16:46,038] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 02:16:46,038] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 4: [2023-03-17 02:16:46,041] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:16:46,041] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:16:46,041] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:16:46,041] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:16:46,041] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 02:16:46,041] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-03-17 02:16:46,041] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 4: [2023-03-17 02:16:46,041] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 02:16:46,041] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 02:16:46,041] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 4: [2023-03-17 02:16:46,041] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 4: [2023-03-17 02:16:46,041] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 4: [2023-03-17 02:16:46,042] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:16:46,042] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:16:46,042] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:16:46,042] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2023-03-17 02:16:46,042] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2023-03-17 02:16:46,042] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 02:16:46,042] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 4: [2023-03-17 02:16:46,042] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 4: [2023-03-17 02:16:46,042] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 4: [2023-03-17 02:16:46,042] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:16:46,042] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-03-17 02:16:46,043] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 3: [2023-03-17 02:16:46,049] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:16:46,049] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:16:46,049] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:16:46,049] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:16:46,049] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:16:46,049] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:16:46,049] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-17 02:16:46,049] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-17 02:16:46,049] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:16:46,049] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:16:46,049] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2023-03-17 02:16:46,049] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-17 02:16:46,049] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 3: [2023-03-17 02:16:46,049] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 02:16:46,049] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 3: [2023-03-17 02:16:46,049] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-03-17 02:16:46,049] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 3: [2023-03-17 02:16:46,049] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2023-03-17 02:16:46,049] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 3: [2023-03-17 02:16:46,049] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-03-17 02:16:46,049] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 3: [2023-03-17 02:16:46,049] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 3: [2023-03-17 02:16:46,049] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 3: [2023-03-17 02:16:46,049] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 0: [2023-03-17 02:16:46,056] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 02:16:46,056] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 7: [2023-03-17 02:16:46,062] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:16:46,062] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:16:46,062] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:16:46,062] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:16:46,062] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:16:46,062] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:16:46,062] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:16:46,062] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-03-17 02:16:46,062] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:16:46,062] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2023-03-17 02:16:46,062] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-03-17 02:16:46,062] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 02:16:46,062] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 7: [2023-03-17 02:16:46,062] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 7: [2023-03-17 02:16:46,062] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 02:16:46,062] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-03-17 02:16:46,062] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 7: [2023-03-17 02:16:46,062] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 7: [2023-03-17 02:16:46,062] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-03-17 02:16:46,062] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-03-17 02:16:46,062] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 7: [2023-03-17 02:16:46,062] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 7: [2023-03-17 02:16:46,062] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 7: [2023-03-17 02:16:46,062] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 2: [2023-03-17 02:16:46,067] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:16:46,067] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:16:46,067] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:16:46,067] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:16:46,067] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:16:46,067] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:16:46,067] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:16:46,067] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-03-17 02:16:46,067] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:16:46,067] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 02:16:46,067] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2023-03-17 02:16:46,067] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2023-03-17 02:16:46,067] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 2: [2023-03-17 02:16:46,067] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 02:16:46,067] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-03-17 02:16:46,067] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 2: [2023-03-17 02:16:46,067] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-03-17 02:16:46,067] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 2: [2023-03-17 02:16:46,067] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 2: [2023-03-17 02:16:46,067] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-03-17 02:16:46,067] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 2: [2023-03-17 02:16:46,067] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 2: [2023-03-17 02:16:46,067] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 2: [2023-03-17 02:16:46,067] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 1: [2023-03-17 02:16:46,067] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:16:46,067] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:16:46,067] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:16:46,067] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:16:46,067] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:16:46,067] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:16:46,067] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:16:46,067] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:16:46,067] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-03-17 02:16:46,067] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 02:16:46,067] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2023-03-17 02:16:46,067] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-03-17 02:16:46,067] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2023-03-17 02:16:46,067] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 02:16:46,067] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-03-17 02:16:46,067] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step70000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-17 02:16:46,067] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 1: [2023-03-17 02:16:46,067] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 1: [2023-03-17 02:16:46,067] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 1: [2023-03-17 02:16:46,067] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 1: [2023-03-17 02:16:46,067] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 1: [2023-03-17 02:16:46,067] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 1: [2023-03-17 02:16:46,067] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 1: [2023-03-17 02:16:46,067] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 0: successfully saved checkpoint at iteration 70000 to checkpoints_146m60b100m 7: time (ms) | save-checkpoint: 439.96 7: iteration 70100/ 115203 | consumed samples: 17945600 | consumed tokens: 36752588800 | elapsed time per iteration (s): 0.39 | learning rate: 8.097E-05 | global batch size: 256 | lm loss: 3.005222E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 662.738 | TFLOPs: 30.93 | 7: iteration 70200/ 115203 | consumed samples: 17971200 | consumed tokens: 36805017600 | elapsed time per iteration (s): 0.38 | learning rate: 8.074E-05 | global batch size: 256 | lm loss: 3.005182E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 671.601 | TFLOPs: 31.35 | 7: iteration 70300/ 115203 | consumed samples: 17996800 | consumed tokens: 36857446400 | elapsed time per iteration (s): 0.39 | learning rate: 8.050E-05 | global batch size: 256 | lm loss: 3.006170E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 664.189 | TFLOPs: 31.00 | 7: iteration 70400/ 115203 | consumed samples: 18022400 | consumed tokens: 36909875200 | elapsed time per iteration (s): 0.38 | learning rate: 8.027E-05 | global batch size: 256 | lm loss: 2.999294E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 668.314 | TFLOPs: 31.19 | 7: iteration 70500/ 115203 | consumed samples: 18048000 | consumed tokens: 36962304000 | elapsed time per iteration (s): 0.38 | learning rate: 8.004E-05 | global batch size: 256 | lm loss: 3.004364E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 675.374 | TFLOPs: 31.52 | 7: iteration 70600/ 115203 | consumed samples: 18073600 | consumed tokens: 37014732800 | elapsed time per iteration (s): 0.38 | learning rate: 7.980E-05 | global batch size: 256 | lm loss: 3.000490E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 669.851 | TFLOPs: 31.27 | 7: iteration 70700/ 115203 | consumed samples: 18099200 | consumed tokens: 37067161600 | elapsed time per iteration (s): 0.38 | learning rate: 7.957E-05 | global batch size: 256 | lm loss: 2.999310E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 670.807 | TFLOPs: 31.31 | 7: iteration 70800/ 115203 | consumed samples: 18124800 | consumed tokens: 37119590400 | elapsed time per iteration (s): 0.38 | learning rate: 7.934E-05 | global batch size: 256 | lm loss: 3.000225E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 674.433 | TFLOPs: 31.48 | 7: iteration 70900/ 115203 | consumed samples: 18150400 | consumed tokens: 37172019200 | elapsed time per iteration (s): 0.38 | learning rate: 7.910E-05 | global batch size: 256 | lm loss: 3.004517E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 670.970 | TFLOPs: 31.32 | 7: iteration 71000/ 115203 | consumed samples: 18176000 | consumed tokens: 37224448000 | elapsed time per iteration (s): 0.38 | learning rate: 7.887E-05 | global batch size: 256 | lm loss: 3.002937E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 671.032 | TFLOPs: 31.32 | 7: iteration 71100/ 115203 | consumed samples: 18201600 | consumed tokens: 37276876800 | elapsed time per iteration (s): 0.38 | learning rate: 7.864E-05 | global batch size: 256 | lm loss: 3.002830E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 673.101 | TFLOPs: 31.42 | 7: iteration 71200/ 115203 | consumed samples: 18227200 | consumed tokens: 37329305600 | elapsed time per iteration (s): 0.38 | learning rate: 7.841E-05 | global batch size: 256 | lm loss: 3.002753E+00 | grad norm: 0.417 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 669.129 | TFLOPs: 31.23 | 7: iteration 71300/ 115203 | consumed samples: 18252800 | consumed tokens: 37381734400 | elapsed time per iteration (s): 0.38 | learning rate: 7.817E-05 | global batch size: 256 | lm loss: 3.003486E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 673.750 | TFLOPs: 31.45 | 7: iteration 71400/ 115203 | consumed samples: 18278400 | consumed tokens: 37434163200 | elapsed time per iteration (s): 0.38 | learning rate: 7.794E-05 | global batch size: 256 | lm loss: 2.999955E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 669.480 | TFLOPs: 31.25 | 7: iteration 71500/ 115203 | consumed samples: 18304000 | consumed tokens: 37486592000 | elapsed time per iteration (s): 0.38 | learning rate: 7.771E-05 | global batch size: 256 | lm loss: 3.002744E+00 | grad norm: 0.271 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 672.413 | TFLOPs: 31.39 | 7: iteration 71600/ 115203 | consumed samples: 18329600 | consumed tokens: 37539020800 | elapsed time per iteration (s): 0.38 | learning rate: 7.748E-05 | global batch size: 256 | lm loss: 2.999968E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 672.331 | TFLOPs: 31.38 | 7: iteration 71700/ 115203 | consumed samples: 18355200 | consumed tokens: 37591449600 | elapsed time per iteration (s): 0.38 | learning rate: 7.725E-05 | global batch size: 256 | lm loss: 2.994655E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 675.555 | TFLOPs: 31.53 | 7: iteration 71800/ 115203 | consumed samples: 18380800 | consumed tokens: 37643878400 | elapsed time per iteration (s): 0.38 | learning rate: 7.702E-05 | global batch size: 256 | lm loss: 2.999268E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 674.193 | TFLOPs: 31.47 | 7: iteration 71900/ 115203 | consumed samples: 18406400 | consumed tokens: 37696307200 | elapsed time per iteration (s): 0.38 | learning rate: 7.679E-05 | global batch size: 256 | lm loss: 2.996751E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 674.000 | TFLOPs: 31.46 | 0: [2023-03-17 02:29:28,016] [INFO] [logging.py:68:log_dist] [Rank 0] step=72000, skipped=0, lr=[7.655593093399763e-05, 7.655593093399763e-05, 7.655593093399763e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 72000/ 115203 | consumed samples: 18432000 | consumed tokens: 37748736000 | elapsed time per iteration (s): 0.38 | learning rate: 7.656E-05 | global batch size: 256 | lm loss: 3.001401E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 676.506 | TFLOPs: 31.58 | 0: steps: 72000 loss: 2.9592 iter time (s): 0.379 samples/sec: 674.657 7: iteration 72100/ 115203 | consumed samples: 18457600 | consumed tokens: 37801164800 | elapsed time per iteration (s): 0.38 | learning rate: 7.633E-05 | global batch size: 256 | lm loss: 3.001079E+00 | grad norm: 0.269 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 673.960 | TFLOPs: 31.46 | 7: iteration 72200/ 115203 | consumed samples: 18483200 | consumed tokens: 37853593600 | elapsed time per iteration (s): 0.38 | learning rate: 7.610E-05 | global batch size: 256 | lm loss: 2.996525E+00 | grad norm: 0.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 670.923 | TFLOPs: 31.32 | 7: iteration 72300/ 115203 | consumed samples: 18508800 | consumed tokens: 37906022400 | elapsed time per iteration (s): 0.38 | learning rate: 7.587E-05 | global batch size: 256 | lm loss: 2.991079E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 669.986 | TFLOPs: 31.27 | 7: iteration 72400/ 115203 | consumed samples: 18534400 | consumed tokens: 37958451200 | elapsed time per iteration (s): 0.38 | learning rate: 7.564E-05 | global batch size: 256 | lm loss: 2.995772E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 670.742 | TFLOPs: 31.31 | 7: iteration 72500/ 115203 | consumed samples: 18560000 | consumed tokens: 38010880000 | elapsed time per iteration (s): 0.38 | learning rate: 7.541E-05 | global batch size: 256 | lm loss: 2.996054E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 670.169 | TFLOPs: 31.28 | 7: iteration 72600/ 115203 | consumed samples: 18585600 | consumed tokens: 38063308800 | elapsed time per iteration (s): 0.38 | learning rate: 7.518E-05 | global batch size: 256 | lm loss: 2.998591E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 676.307 | TFLOPs: 31.57 | 7: iteration 72700/ 115203 | consumed samples: 18611200 | consumed tokens: 38115737600 | elapsed time per iteration (s): 0.38 | learning rate: 7.495E-05 | global batch size: 256 | lm loss: 3.000162E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 671.986 | TFLOPs: 31.37 | 7: iteration 72800/ 115203 | consumed samples: 18636800 | consumed tokens: 38168166400 | elapsed time per iteration (s): 0.38 | learning rate: 7.472E-05 | global batch size: 256 | lm loss: 3.001420E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 677.069 | TFLOPs: 31.60 | 7: iteration 72900/ 115203 | consumed samples: 18662400 | consumed tokens: 38220595200 | elapsed time per iteration (s): 0.38 | learning rate: 7.450E-05 | global batch size: 256 | lm loss: 2.996115E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 677.510 | TFLOPs: 31.62 | 7: iteration 73000/ 115203 | consumed samples: 18688000 | consumed tokens: 38273024000 | elapsed time per iteration (s): 0.38 | learning rate: 7.427E-05 | global batch size: 256 | lm loss: 2.993749E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 679.283 | TFLOPs: 31.71 | 7: iteration 73100/ 115203 | consumed samples: 18713600 | consumed tokens: 38325452800 | elapsed time per iteration (s): 0.38 | learning rate: 7.404E-05 | global batch size: 256 | lm loss: 2.998109E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 676.375 | TFLOPs: 31.57 | 7: iteration 73200/ 115203 | consumed samples: 18739200 | consumed tokens: 38377881600 | elapsed time per iteration (s): 0.38 | learning rate: 7.381E-05 | global batch size: 256 | lm loss: 2.995750E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 679.527 | TFLOPs: 31.72 | 7: iteration 73300/ 115203 | consumed samples: 18764800 | consumed tokens: 38430310400 | elapsed time per iteration (s): 0.38 | learning rate: 7.359E-05 | global batch size: 256 | lm loss: 2.995815E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 679.730 | TFLOPs: 31.73 | 7: iteration 73400/ 115203 | consumed samples: 18790400 | consumed tokens: 38482739200 | elapsed time per iteration (s): 0.38 | learning rate: 7.336E-05 | global batch size: 256 | lm loss: 2.999130E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 674.760 | TFLOPs: 31.50 | 7: iteration 73500/ 115203 | consumed samples: 18816000 | consumed tokens: 38535168000 | elapsed time per iteration (s): 0.38 | learning rate: 7.313E-05 | global batch size: 256 | lm loss: 2.995415E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 679.387 | TFLOPs: 31.71 | 7: iteration 73600/ 115203 | consumed samples: 18841600 | consumed tokens: 38587596800 | elapsed time per iteration (s): 0.38 | learning rate: 7.291E-05 | global batch size: 256 | lm loss: 2.997661E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 679.419 | TFLOPs: 31.71 | 7: iteration 73700/ 115203 | consumed samples: 18867200 | consumed tokens: 38640025600 | elapsed time per iteration (s): 0.38 | learning rate: 7.268E-05 | global batch size: 256 | lm loss: 2.995501E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 676.394 | TFLOPs: 31.57 | 7: iteration 73800/ 115203 | consumed samples: 18892800 | consumed tokens: 38692454400 | elapsed time per iteration (s): 0.38 | learning rate: 7.246E-05 | global batch size: 256 | lm loss: 2.995099E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 678.060 | TFLOPs: 31.65 | 7: iteration 73900/ 115203 | consumed samples: 18918400 | consumed tokens: 38744883200 | elapsed time per iteration (s): 0.38 | learning rate: 7.223E-05 | global batch size: 256 | lm loss: 2.995841E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 679.302 | TFLOPs: 31.71 | 0: [2023-03-17 02:42:05,276] [INFO] [logging.py:68:log_dist] [Rank 0] step=74000, skipped=0, lr=[7.20058819630707e-05, 7.20058819630707e-05, 7.20058819630707e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 74000/ 115203 | consumed samples: 18944000 | consumed tokens: 38797312000 | elapsed time per iteration (s): 0.38 | learning rate: 7.201E-05 | global batch size: 256 | lm loss: 2.990284E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.938 | TFLOPs: 31.83 | 0: steps: 74000 loss: 3.0010 iter time (s): 0.377 samples/sec: 678.743 7: iteration 74100/ 115203 | consumed samples: 18969600 | consumed tokens: 38849740800 | elapsed time per iteration (s): 0.38 | learning rate: 7.178E-05 | global batch size: 256 | lm loss: 2.993707E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.311 | TFLOPs: 31.80 | 7: iteration 74200/ 115203 | consumed samples: 18995200 | consumed tokens: 38902169600 | elapsed time per iteration (s): 0.38 | learning rate: 7.156E-05 | global batch size: 256 | lm loss: 2.993644E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.421 | TFLOPs: 31.81 | 7: iteration 74300/ 115203 | consumed samples: 19020800 | consumed tokens: 38954598400 | elapsed time per iteration (s): 0.38 | learning rate: 7.133E-05 | global batch size: 256 | lm loss: 3.001241E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.971 | TFLOPs: 31.83 | 7: iteration 74400/ 115203 | consumed samples: 19046400 | consumed tokens: 39007027200 | elapsed time per iteration (s): 0.38 | learning rate: 7.111E-05 | global batch size: 256 | lm loss: 2.996901E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.008 | TFLOPs: 31.83 | 7: iteration 74500/ 115203 | consumed samples: 19072000 | consumed tokens: 39059456000 | elapsed time per iteration (s): 0.38 | learning rate: 7.089E-05 | global batch size: 256 | lm loss: 2.994951E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.992 | TFLOPs: 31.83 | 7: iteration 74600/ 115203 | consumed samples: 19097600 | consumed tokens: 39111884800 | elapsed time per iteration (s): 0.38 | learning rate: 7.066E-05 | global batch size: 256 | lm loss: 2.992970E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.209 | TFLOPs: 31.80 | 7: iteration 74700/ 115203 | consumed samples: 19123200 | consumed tokens: 39164313600 | elapsed time per iteration (s): 0.38 | learning rate: 7.044E-05 | global batch size: 256 | lm loss: 2.996036E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.935 | TFLOPs: 31.83 | 7: iteration 74800/ 115203 | consumed samples: 19148800 | consumed tokens: 39216742400 | elapsed time per iteration (s): 0.38 | learning rate: 7.022E-05 | global batch size: 256 | lm loss: 2.989814E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.774 | TFLOPs: 31.82 | 7: iteration 74900/ 115203 | consumed samples: 19174400 | consumed tokens: 39269171200 | elapsed time per iteration (s): 0.38 | learning rate: 7.000E-05 | global batch size: 256 | lm loss: 2.989238E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.186 | TFLOPs: 31.84 | 7: iteration 75000/ 115203 | consumed samples: 19200000 | consumed tokens: 39321600000 | elapsed time per iteration (s): 0.38 | learning rate: 6.977E-05 | global batch size: 256 | lm loss: 2.994109E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.485 | TFLOPs: 31.86 | 7: iteration 75100/ 115203 | consumed samples: 19225600 | consumed tokens: 39374028800 | elapsed time per iteration (s): 0.38 | learning rate: 6.955E-05 | global batch size: 256 | lm loss: 2.988073E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.981 | TFLOPs: 31.83 | 7: iteration 75200/ 115203 | consumed samples: 19251200 | consumed tokens: 39426457600 | elapsed time per iteration (s): 0.38 | learning rate: 6.933E-05 | global batch size: 256 | lm loss: 2.993457E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.827 | TFLOPs: 31.83 | 7: iteration 75300/ 115203 | consumed samples: 19276800 | consumed tokens: 39478886400 | elapsed time per iteration (s): 0.38 | learning rate: 6.911E-05 | global batch size: 256 | lm loss: 2.991549E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.338 | TFLOPs: 31.85 | 7: iteration 75400/ 115203 | consumed samples: 19302400 | consumed tokens: 39531315200 | elapsed time per iteration (s): 0.38 | learning rate: 6.889E-05 | global batch size: 256 | lm loss: 2.991137E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.891 | TFLOPs: 31.83 | 7: iteration 75500/ 115203 | consumed samples: 19328000 | consumed tokens: 39583744000 | elapsed time per iteration (s): 0.38 | learning rate: 6.867E-05 | global batch size: 256 | lm loss: 2.991899E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.398 | TFLOPs: 31.85 | 7: iteration 75600/ 115203 | consumed samples: 19353600 | consumed tokens: 39636172800 | elapsed time per iteration (s): 0.37 | learning rate: 6.845E-05 | global batch size: 256 | lm loss: 2.992591E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 683.244 | TFLOPs: 31.89 | 7: iteration 75700/ 115203 | consumed samples: 19379200 | consumed tokens: 39688601600 | elapsed time per iteration (s): 0.37 | learning rate: 6.823E-05 | global batch size: 256 | lm loss: 2.988594E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.994 | TFLOPs: 31.88 | 7: iteration 75800/ 115203 | consumed samples: 19404800 | consumed tokens: 39741030400 | elapsed time per iteration (s): 0.38 | learning rate: 6.801E-05 | global batch size: 256 | lm loss: 2.994444E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 679.947 | TFLOPs: 31.74 | 7: iteration 75900/ 115203 | consumed samples: 19430400 | consumed tokens: 39793459200 | elapsed time per iteration (s): 0.38 | learning rate: 6.779E-05 | global batch size: 256 | lm loss: 2.985750E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.347 | TFLOPs: 31.85 | 0: [2023-03-17 02:54:36,214] [INFO] [logging.py:68:log_dist] [Rank 0] step=76000, skipped=0, lr=[6.757111507639708e-05, 6.757111507639708e-05, 6.757111507639708e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 76000/ 115203 | consumed samples: 19456000 | consumed tokens: 39845888000 | elapsed time per iteration (s): 0.38 | learning rate: 6.757E-05 | global batch size: 256 | lm loss: 2.991369E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 679.069 | TFLOPs: 31.70 | 0: steps: 76000 loss: 3.0127 iter time (s): 0.374 samples/sec: 685.124 7: iteration 76100/ 115203 | consumed samples: 19481600 | consumed tokens: 39898316800 | elapsed time per iteration (s): 0.38 | learning rate: 6.735E-05 | global batch size: 256 | lm loss: 2.992580E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 680.584 | TFLOPs: 31.77 | 7: iteration 76200/ 115203 | consumed samples: 19507200 | consumed tokens: 39950745600 | elapsed time per iteration (s): 0.38 | learning rate: 6.713E-05 | global batch size: 256 | lm loss: 2.989844E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.625 | TFLOPs: 31.82 | 7: iteration 76300/ 115203 | consumed samples: 19532800 | consumed tokens: 40003174400 | elapsed time per iteration (s): 0.38 | learning rate: 6.692E-05 | global batch size: 256 | lm loss: 2.990569E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.357 | TFLOPs: 31.80 | 7: iteration 76400/ 115203 | consumed samples: 19558400 | consumed tokens: 40055603200 | elapsed time per iteration (s): 0.38 | learning rate: 6.670E-05 | global batch size: 256 | lm loss: 2.992003E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.570 | TFLOPs: 31.86 | 7: iteration 76500/ 115203 | consumed samples: 19584000 | consumed tokens: 40108032000 | elapsed time per iteration (s): 0.38 | learning rate: 6.648E-05 | global batch size: 256 | lm loss: 2.991619E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.299 | TFLOPs: 31.85 | 7: iteration 76600/ 115203 | consumed samples: 19609600 | consumed tokens: 40160460800 | elapsed time per iteration (s): 0.38 | learning rate: 6.627E-05 | global batch size: 256 | lm loss: 2.989504E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.658 | TFLOPs: 31.86 | 7: iteration 76700/ 115203 | consumed samples: 19635200 | consumed tokens: 40212889600 | elapsed time per iteration (s): 0.38 | learning rate: 6.605E-05 | global batch size: 256 | lm loss: 2.994782E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.604 | TFLOPs: 31.86 | 7: iteration 76800/ 115203 | consumed samples: 19660800 | consumed tokens: 40265318400 | elapsed time per iteration (s): 0.37 | learning rate: 6.583E-05 | global batch size: 256 | lm loss: 2.988898E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.739 | TFLOPs: 31.87 | 7: iteration 76900/ 115203 | consumed samples: 19686400 | consumed tokens: 40317747200 | elapsed time per iteration (s): 0.38 | learning rate: 6.562E-05 | global batch size: 256 | lm loss: 2.992355E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 680.770 | TFLOPs: 31.78 | 7: iteration 77000/ 115203 | consumed samples: 19712000 | consumed tokens: 40370176000 | elapsed time per iteration (s): 0.38 | learning rate: 6.540E-05 | global batch size: 256 | lm loss: 2.989778E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.427 | TFLOPs: 31.81 | 7: iteration 77100/ 115203 | consumed samples: 19737600 | consumed tokens: 40422604800 | elapsed time per iteration (s): 0.38 | learning rate: 6.519E-05 | global batch size: 256 | lm loss: 2.989146E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.210 | TFLOPs: 31.80 | 7: iteration 77200/ 115203 | consumed samples: 19763200 | consumed tokens: 40475033600 | elapsed time per iteration (s): 0.38 | learning rate: 6.497E-05 | global batch size: 256 | lm loss: 2.989254E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.562 | TFLOPs: 31.81 | 7: iteration 77300/ 115203 | consumed samples: 19788800 | consumed tokens: 40527462400 | elapsed time per iteration (s): 0.38 | learning rate: 6.476E-05 | global batch size: 256 | lm loss: 2.989318E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.770 | TFLOPs: 31.82 | 7: iteration 77400/ 115203 | consumed samples: 19814400 | consumed tokens: 40579891200 | elapsed time per iteration (s): 0.38 | learning rate: 6.454E-05 | global batch size: 256 | lm loss: 2.990283E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.082 | TFLOPs: 31.79 | 7: iteration 77500/ 115203 | consumed samples: 19840000 | consumed tokens: 40632320000 | elapsed time per iteration (s): 0.37 | learning rate: 6.433E-05 | global batch size: 256 | lm loss: 2.988592E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.691 | TFLOPs: 31.87 | 7: iteration 77600/ 115203 | consumed samples: 19865600 | consumed tokens: 40684748800 | elapsed time per iteration (s): 0.38 | learning rate: 6.412E-05 | global batch size: 256 | lm loss: 2.990188E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 679.992 | TFLOPs: 31.74 | 7: iteration 77700/ 115203 | consumed samples: 19891200 | consumed tokens: 40737177600 | elapsed time per iteration (s): 0.38 | learning rate: 6.390E-05 | global batch size: 256 | lm loss: 2.987324E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 679.054 | TFLOPs: 31.70 | 7: iteration 77800/ 115203 | consumed samples: 19916800 | consumed tokens: 40789606400 | elapsed time per iteration (s): 0.38 | learning rate: 6.369E-05 | global batch size: 256 | lm loss: 2.986968E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 679.719 | TFLOPs: 31.73 | 7: iteration 77900/ 115203 | consumed samples: 19942400 | consumed tokens: 40842035200 | elapsed time per iteration (s): 0.38 | learning rate: 6.348E-05 | global batch size: 256 | lm loss: 2.989189E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.990 | TFLOPs: 31.83 | 0: [2023-03-17 03:07:07,459] [INFO] [logging.py:68:log_dist] [Rank 0] step=78000, skipped=0, lr=[6.326508628233516e-05, 6.326508628233516e-05, 6.326508628233516e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 78000/ 115203 | consumed samples: 19968000 | consumed tokens: 40894464000 | elapsed time per iteration (s): 0.37 | learning rate: 6.327E-05 | global batch size: 256 | lm loss: 2.990387E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 683.031 | TFLOPs: 31.88 | 0: steps: 78000 loss: 2.9752 iter time (s): 0.374 samples/sec: 685.304 7: iteration 78100/ 115203 | consumed samples: 19993600 | consumed tokens: 40946892800 | elapsed time per iteration (s): 0.37 | learning rate: 6.305E-05 | global batch size: 256 | lm loss: 2.987387E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.781 | TFLOPs: 31.87 | 7: iteration 78200/ 115203 | consumed samples: 20019200 | consumed tokens: 40999321600 | elapsed time per iteration (s): 0.37 | learning rate: 6.284E-05 | global batch size: 256 | lm loss: 2.986763E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.719 | TFLOPs: 31.87 | 7: iteration 78300/ 115203 | consumed samples: 20044800 | consumed tokens: 41051750400 | elapsed time per iteration (s): 0.37 | learning rate: 6.263E-05 | global batch size: 256 | lm loss: 2.981401E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 683.299 | TFLOPs: 31.89 | 7: iteration 78400/ 115203 | consumed samples: 20070400 | consumed tokens: 41104179200 | elapsed time per iteration (s): 0.37 | learning rate: 6.242E-05 | global batch size: 256 | lm loss: 2.986788E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 683.422 | TFLOPs: 31.90 | 7: iteration 78500/ 115203 | consumed samples: 20096000 | consumed tokens: 41156608000 | elapsed time per iteration (s): 0.37 | learning rate: 6.221E-05 | global batch size: 256 | lm loss: 2.986251E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 683.435 | TFLOPs: 31.90 | 7: iteration 78600/ 115203 | consumed samples: 20121600 | consumed tokens: 41209036800 | elapsed time per iteration (s): 0.37 | learning rate: 6.200E-05 | global batch size: 256 | lm loss: 2.983145E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 683.612 | TFLOPs: 31.91 | 7: iteration 78700/ 115203 | consumed samples: 20147200 | consumed tokens: 41261465600 | elapsed time per iteration (s): 0.37 | learning rate: 6.179E-05 | global batch size: 256 | lm loss: 2.983262E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 683.556 | TFLOPs: 31.91 | 7: iteration 78800/ 115203 | consumed samples: 20172800 | consumed tokens: 41313894400 | elapsed time per iteration (s): 0.37 | learning rate: 6.158E-05 | global batch size: 256 | lm loss: 2.986954E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.983 | TFLOPs: 31.88 | 7: iteration 78900/ 115203 | consumed samples: 20198400 | consumed tokens: 41366323200 | elapsed time per iteration (s): 0.37 | learning rate: 6.137E-05 | global batch size: 256 | lm loss: 2.984932E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 683.124 | TFLOPs: 31.89 | 7: iteration 79000/ 115203 | consumed samples: 20224000 | consumed tokens: 41418752000 | elapsed time per iteration (s): 0.37 | learning rate: 6.116E-05 | global batch size: 256 | lm loss: 2.982396E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.872 | TFLOPs: 31.87 | 7: iteration 79100/ 115203 | consumed samples: 20249600 | consumed tokens: 41471180800 | elapsed time per iteration (s): 0.37 | learning rate: 6.096E-05 | global batch size: 256 | lm loss: 2.985192E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 683.051 | TFLOPs: 31.88 | 7: iteration 79200/ 115203 | consumed samples: 20275200 | consumed tokens: 41523609600 | elapsed time per iteration (s): 0.38 | learning rate: 6.075E-05 | global batch size: 256 | lm loss: 2.985985E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.078 | TFLOPs: 31.84 | 7: iteration 79300/ 115203 | consumed samples: 20300800 | consumed tokens: 41576038400 | elapsed time per iteration (s): 0.37 | learning rate: 6.054E-05 | global batch size: 256 | lm loss: 2.990179E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.943 | TFLOPs: 31.88 | 7: iteration 79400/ 115203 | consumed samples: 20326400 | consumed tokens: 41628467200 | elapsed time per iteration (s): 0.37 | learning rate: 6.033E-05 | global batch size: 256 | lm loss: 2.987198E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 683.071 | TFLOPs: 31.88 | 7: iteration 79500/ 115203 | consumed samples: 20352000 | consumed tokens: 41680896000 | elapsed time per iteration (s): 0.37 | learning rate: 6.013E-05 | global batch size: 256 | lm loss: 2.987628E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 683.084 | TFLOPs: 31.88 | 7: iteration 79600/ 115203 | consumed samples: 20377600 | consumed tokens: 41733324800 | elapsed time per iteration (s): 0.38 | learning rate: 5.992E-05 | global batch size: 256 | lm loss: 2.985708E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.208 | TFLOPs: 31.80 | 7: iteration 79700/ 115203 | consumed samples: 20403200 | consumed tokens: 41785753600 | elapsed time per iteration (s): 0.38 | learning rate: 5.972E-05 | global batch size: 256 | lm loss: 2.984028E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 677.604 | TFLOPs: 31.63 | 7: iteration 79800/ 115203 | consumed samples: 20428800 | consumed tokens: 41838182400 | elapsed time per iteration (s): 0.38 | learning rate: 5.951E-05 | global batch size: 256 | lm loss: 2.984569E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 680.377 | TFLOPs: 31.76 | 7: iteration 79900/ 115203 | consumed samples: 20454400 | consumed tokens: 41890611200 | elapsed time per iteration (s): 0.38 | learning rate: 5.931E-05 | global batch size: 256 | lm loss: 2.984146E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 680.441 | TFLOPs: 31.76 | 0: [2023-03-17 03:19:38,414] [INFO] [logging.py:68:log_dist] [Rank 0] step=80000, skipped=0, lr=[5.910086097100006e-05, 5.910086097100006e-05, 5.910086097100006e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 80000/ 115203 | consumed samples: 20480000 | consumed tokens: 41943040000 | elapsed time per iteration (s): 0.38 | learning rate: 5.910E-05 | global batch size: 256 | lm loss: 2.989536E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 670.572 | TFLOPs: 31.30 | 0: steps: 80000 loss: 2.9558 iter time (s): 0.374 samples/sec: 685.401 7: ------------------------------------------------------------------------------------------------ 7: validation loss at iteration 80000 | lm loss value: 3.868458E+00 | lm loss PPL: 4.786849E+01 | 7: ------------------------------------------------------------------------------------------------ 0: saving checkpoint at iteration 80000 to checkpoints_146m60b100m 0: [2023-03-17 03:19:38,545] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step80000 is begin to save! 0: [2023-03-17 03:19:38,549] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step80000/layer_01-model_00-model_states.pt... 0: [2023-03-17 03:19:38,641] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step80000/layer_01-model_00-model_states.pt. 0: [2023-03-17 03:19:38,642] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step80000/layer_03-model_00-model_states.pt... 0: [2023-03-17 03:19:38,658] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step80000/layer_03-model_00-model_states.pt. 0: [2023-03-17 03:19:38,658] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step80000/layer_04-model_00-model_states.pt... 0: [2023-03-17 03:19:38,673] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step80000/layer_04-model_00-model_states.pt. 0: [2023-03-17 03:19:38,674] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step80000/layer_05-model_00-model_states.pt... 0: [2023-03-17 03:19:38,688] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step80000/layer_05-model_00-model_states.pt. 0: [2023-03-17 03:19:38,689] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step80000/layer_06-model_00-model_states.pt... 0: [2023-03-17 03:19:38,703] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step80000/layer_06-model_00-model_states.pt. 0: [2023-03-17 03:19:38,704] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step80000/layer_07-model_00-model_states.pt... 0: [2023-03-17 03:19:38,718] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step80000/layer_07-model_00-model_states.pt. 0: [2023-03-17 03:19:38,718] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step80000/layer_08-model_00-model_states.pt... 0: [2023-03-17 03:19:38,733] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step80000/layer_08-model_00-model_states.pt. 0: [2023-03-17 03:19:38,733] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step80000/layer_09-model_00-model_states.pt... 0: [2023-03-17 03:19:38,748] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step80000/layer_09-model_00-model_states.pt. 0: [2023-03-17 03:19:38,748] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step80000/layer_10-model_00-model_states.pt... 0: [2023-03-17 03:19:38,763] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step80000/layer_10-model_00-model_states.pt. 0: [2023-03-17 03:19:38,763] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step80000/layer_11-model_00-model_states.pt... 0: [2023-03-17 03:19:38,777] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step80000/layer_11-model_00-model_states.pt. 0: [2023-03-17 03:19:38,778] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step80000/layer_12-model_00-model_states.pt... 0: [2023-03-17 03:19:38,794] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step80000/layer_12-model_00-model_states.pt. 0: [2023-03-17 03:19:38,794] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step80000/layer_13-model_00-model_states.pt... 0: [2023-03-17 03:19:38,809] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step80000/layer_13-model_00-model_states.pt. 0: [2023-03-17 03:19:38,810] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step80000/layer_14-model_00-model_states.pt... 0: [2023-03-17 03:19:38,825] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step80000/layer_14-model_00-model_states.pt. 0: [2023-03-17 03:19:38,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step80000/layer_15-model_00-model_states.pt... 0: [2023-03-17 03:19:38,840] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step80000/layer_15-model_00-model_states.pt. 0: [2023-03-17 03:19:38,840] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step80000/layer_16-model_00-model_states.pt... 0: [2023-03-17 03:19:38,854] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step80000/layer_16-model_00-model_states.pt. 0: [2023-03-17 03:19:38,855] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step80000/layer_17-model_00-model_states.pt... 0: [2023-03-17 03:19:38,869] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step80000/layer_17-model_00-model_states.pt. 0: [2023-03-17 03:19:38,869] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step80000/layer_19-model_00-model_states.pt... 0: [2023-03-17 03:19:38,870] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step80000/layer_19-model_00-model_states.pt. 0: [2023-03-17 03:19:38,871] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_146m60b100m/global_step80000/mp_rank_00_model_states.pt 0: [2023-03-17 03:19:38,871] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step80000/mp_rank_00_model_states.pt... 0: [2023-03-17 03:19:38,873] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step80000/mp_rank_00_model_states.pt. 0: [2023-03-17 03:19:38,891] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:19:38,891] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:19:38,891] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:19:38,891] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:19:38,891] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:19:38,891] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:19:38,891] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:19:38,891] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:19:38,891] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:19:38,891] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:19:38,891] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:19:38,891] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:19:38,891] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:19:38,891] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:19:38,891] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:19:38,891] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:19:38,891] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:19:38,891] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:19:38,891] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:19:38,891] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:19:38,891] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:19:38,891] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:19:38,891] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:19:38,891] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:19:38,891] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:19:38,891] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:19:38,891] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:19:38,891] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:19:38,891] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:19:38,891] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:19:38,891] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:19:38,891] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:19:38,891] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:19:38,891] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:19:38,891] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:19:38,891] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:19:38,891] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:19:38,891] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:19:38,891] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:19:38,891] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:19:38,891] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:19:38,891] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:19:38,891] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:19:38,891] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:19:38,891] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:19:38,891] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:19:38,891] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:19:38,891] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:19:38,891] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:19:38,891] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:19:38,891] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:19:38,891] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:19:38,891] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:19:38,891] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:19:38,891] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:19:38,891] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:19:38,891] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:19:38,891] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:19:38,891] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:19:38,891] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:19:38,891] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:19:38,891] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:19:38,891] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:19:38,891] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:19:38,924] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:19:38,924] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:19:38,924] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-03-17 03:19:38,924] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 7: [2023-03-17 03:19:38,927] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:19:38,927] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-03-17 03:19:38,927] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 0: [2023-03-17 03:19:38,927] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:19:38,927] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-03-17 03:19:38,927] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 0: [2023-03-17 03:19:38,928] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:19:38,928] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-03-17 03:19:38,928] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 0: [2023-03-17 03:19:38,928] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:19:38,928] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-17 03:19:38,928] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 0: [2023-03-17 03:19:38,929] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:19:38,929] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:19:38,929] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-03-17 03:19:38,929] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 0: [2023-03-17 03:19:38,929] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-17 03:19:38,929] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 7: [2023-03-17 03:19:38,931] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:19:38,931] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 03:19:38,931] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 7: [2023-03-17 03:19:38,931] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:19:38,931] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:19:38,931] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 0: [2023-03-17 03:19:38,931] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-17 03:19:38,931] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 7: [2023-03-17 03:19:38,931] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 7: [2023-03-17 03:19:38,931] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:19:38,931] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-03-17 03:19:38,931] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 2: [2023-03-17 03:19:38,931] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:19:38,932] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 03:19:38,932] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 2: [2023-03-17 03:19:38,933] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:19:38,933] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 03:19:38,933] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 2: [2023-03-17 03:19:38,933] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:19:38,933] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-03-17 03:19:38,933] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 2: [2023-03-17 03:19:38,933] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:19:38,933] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2023-03-17 03:19:38,933] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 2: [2023-03-17 03:19:38,933] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:19:38,934] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2023-03-17 03:19:38,934] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 7: [2023-03-17 03:19:38,937] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:19:38,937] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:19:38,937] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:19:38,937] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:19:38,937] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-03-17 03:19:38,937] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-03-17 03:19:38,937] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-03-17 03:19:38,937] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 03:19:38,937] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 7: [2023-03-17 03:19:38,937] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 7: [2023-03-17 03:19:38,937] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 7: [2023-03-17 03:19:38,937] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 2: [2023-03-17 03:19:38,941] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:19:38,941] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-03-17 03:19:38,941] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 2: [2023-03-17 03:19:38,941] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:19:38,941] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-03-17 03:19:38,941] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 2: [2023-03-17 03:19:38,941] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:19:38,941] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-03-17 03:19:38,941] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 3: [2023-03-17 03:19:38,947] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:19:38,947] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:19:38,947] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:19:38,947] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:19:38,947] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:19:38,947] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2023-03-17 03:19:38,947] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-03-17 03:19:38,947] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-17 03:19:38,947] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-17 03:19:38,947] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-17 03:19:38,947] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 3: [2023-03-17 03:19:38,947] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 3: [2023-03-17 03:19:38,947] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 3: [2023-03-17 03:19:38,947] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 3: [2023-03-17 03:19:38,947] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 3: [2023-03-17 03:19:38,948] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:19:38,948] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:19:38,948] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:19:38,948] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-03-17 03:19:38,948] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2023-03-17 03:19:38,948] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 03:19:38,948] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 3: [2023-03-17 03:19:38,948] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 3: [2023-03-17 03:19:38,948] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 0: [2023-03-17 03:19:38,954] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 03:19:38,954] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 4: [2023-03-17 03:19:38,962] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:19:38,962] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:19:38,962] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:19:38,962] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:19:38,962] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:19:38,962] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:19:38,962] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:19:38,962] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:19:38,962] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 03:19:38,962] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2023-03-17 03:19:38,962] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 1: [2023-03-17 03:19:38,962] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:19:38,962] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:19:38,962] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:19:38,962] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:19:38,962] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:19:38,962] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-03-17 03:19:38,962] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 1: [2023-03-17 03:19:38,962] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:19:38,962] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:19:38,962] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 03:19:38,962] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 4: [2023-03-17 03:19:38,962] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 4: [2023-03-17 03:19:38,962] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 4: [2023-03-17 03:19:38,962] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 03:19:38,962] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 1: [2023-03-17 03:19:38,962] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:19:38,962] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-03-17 03:19:38,962] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 03:19:38,962] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 4: [2023-03-17 03:19:38,962] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 4: [2023-03-17 03:19:38,962] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 1: [2023-03-17 03:19:38,962] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 03:19:38,962] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-03-17 03:19:38,962] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2023-03-17 03:19:38,962] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 4: [2023-03-17 03:19:38,962] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 4: [2023-03-17 03:19:38,962] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 1: [2023-03-17 03:19:38,962] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 4: [2023-03-17 03:19:38,962] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 1: [2023-03-17 03:19:38,962] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 1: [2023-03-17 03:19:38,962] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 1: [2023-03-17 03:19:38,962] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 1: [2023-03-17 03:19:38,962] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 1: [2023-03-17 03:19:38,962] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 1: [2023-03-17 03:19:38,962] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 1: [2023-03-17 03:19:38,962] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 1: [2023-03-17 03:19:38,962] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 6: [2023-03-17 03:19:38,965] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:19:38,965] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:19:38,965] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:19:38,965] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:19:38,965] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:19:38,965] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:19:38,965] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:19:38,965] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:19:38,965] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-17 03:19:38,965] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2023-03-17 03:19:38,965] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 03:19:38,965] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 03:19:38,965] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-03-17 03:19:38,965] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2023-03-17 03:19:38,965] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-17 03:19:38,965] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2023-03-17 03:19:38,965] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 6: [2023-03-17 03:19:38,965] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 6: [2023-03-17 03:19:38,965] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 6: [2023-03-17 03:19:38,965] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 6: [2023-03-17 03:19:38,965] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 6: [2023-03-17 03:19:38,965] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 6: [2023-03-17 03:19:38,965] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 6: [2023-03-17 03:19:38,965] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 5: [2023-03-17 03:19:38,967] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:19:38,967] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:19:38,967] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:19:38,967] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:19:38,967] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:19:38,967] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:19:38,967] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:19:38,967] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:19:38,967] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-03-17 03:19:38,967] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-03-17 03:19:38,967] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-03-17 03:19:38,967] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-03-17 03:19:38,967] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-03-17 03:19:38,967] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 5: [2023-03-17 03:19:38,967] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-03-17 03:19:38,967] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-17 03:19:38,967] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 5: [2023-03-17 03:19:38,967] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step80000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2023-03-17 03:19:38,967] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 5: [2023-03-17 03:19:38,967] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 5: [2023-03-17 03:19:38,967] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 5: [2023-03-17 03:19:38,967] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 5: [2023-03-17 03:19:38,967] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 5: [2023-03-17 03:19:38,967] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 0: successfully saved checkpoint at iteration 80000 to checkpoints_146m60b100m 7: time (ms) | save-checkpoint: 434.30 7: iteration 80100/ 115203 | consumed samples: 20505600 | consumed tokens: 41995468800 | elapsed time per iteration (s): 0.38 | learning rate: 5.890E-05 | global batch size: 256 | lm loss: 2.983285E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 667.555 | TFLOPs: 31.16 | 7: iteration 80200/ 115203 | consumed samples: 20531200 | consumed tokens: 42047897600 | elapsed time per iteration (s): 0.38 | learning rate: 5.869E-05 | global batch size: 256 | lm loss: 2.985786E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 678.304 | TFLOPs: 31.66 | 7: iteration 80300/ 115203 | consumed samples: 20556800 | consumed tokens: 42100326400 | elapsed time per iteration (s): 0.38 | learning rate: 5.849E-05 | global batch size: 256 | lm loss: 2.976292E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 674.490 | TFLOPs: 31.48 | 7: iteration 80400/ 115203 | consumed samples: 20582400 | consumed tokens: 42152755200 | elapsed time per iteration (s): 0.38 | learning rate: 5.829E-05 | global batch size: 256 | lm loss: 2.981631E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 667.337 | TFLOPs: 31.15 | 7: iteration 80500/ 115203 | consumed samples: 20608000 | consumed tokens: 42205184000 | elapsed time per iteration (s): 0.38 | learning rate: 5.808E-05 | global batch size: 256 | lm loss: 2.981618E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 668.731 | TFLOPs: 31.21 | 7: iteration 80600/ 115203 | consumed samples: 20633600 | consumed tokens: 42257612800 | elapsed time per iteration (s): 0.38 | learning rate: 5.788E-05 | global batch size: 256 | lm loss: 2.986088E+00 | grad norm: 0.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 672.731 | TFLOPs: 31.40 | 7: iteration 80700/ 115203 | consumed samples: 20659200 | consumed tokens: 42310041600 | elapsed time per iteration (s): 0.39 | learning rate: 5.768E-05 | global batch size: 256 | lm loss: 2.979868E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 664.355 | TFLOPs: 31.01 | 7: iteration 80800/ 115203 | consumed samples: 20684800 | consumed tokens: 42362470400 | elapsed time per iteration (s): 0.38 | learning rate: 5.748E-05 | global batch size: 256 | lm loss: 2.985775E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 674.372 | TFLOPs: 31.48 | 7: iteration 80900/ 115203 | consumed samples: 20710400 | consumed tokens: 42414899200 | elapsed time per iteration (s): 0.38 | learning rate: 5.728E-05 | global batch size: 256 | lm loss: 2.986025E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 674.773 | TFLOPs: 31.50 | 7: iteration 81000/ 115203 | consumed samples: 20736000 | consumed tokens: 42467328000 | elapsed time per iteration (s): 0.38 | learning rate: 5.708E-05 | global batch size: 256 | lm loss: 2.983038E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 670.134 | TFLOPs: 31.28 | 7: iteration 81100/ 115203 | consumed samples: 20761600 | consumed tokens: 42519756800 | elapsed time per iteration (s): 0.38 | learning rate: 5.688E-05 | global batch size: 256 | lm loss: 2.980871E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 671.020 | TFLOPs: 31.32 | 7: iteration 81200/ 115203 | consumed samples: 20787200 | consumed tokens: 42572185600 | elapsed time per iteration (s): 0.38 | learning rate: 5.668E-05 | global batch size: 256 | lm loss: 2.986044E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 674.356 | TFLOPs: 31.48 | 7: iteration 81300/ 115203 | consumed samples: 20812800 | consumed tokens: 42624614400 | elapsed time per iteration (s): 0.38 | learning rate: 5.648E-05 | global batch size: 256 | lm loss: 2.981158E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 673.467 | TFLOPs: 31.43 | 7: iteration 81400/ 115203 | consumed samples: 20838400 | consumed tokens: 42677043200 | elapsed time per iteration (s): 0.38 | learning rate: 5.628E-05 | global batch size: 256 | lm loss: 2.982794E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 669.105 | TFLOPs: 31.23 | 7: iteration 81500/ 115203 | consumed samples: 20864000 | consumed tokens: 42729472000 | elapsed time per iteration (s): 0.38 | learning rate: 5.608E-05 | global batch size: 256 | lm loss: 2.977894E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 675.214 | TFLOPs: 31.52 | 7: iteration 81600/ 115203 | consumed samples: 20889600 | consumed tokens: 42781900800 | elapsed time per iteration (s): 0.38 | learning rate: 5.588E-05 | global batch size: 256 | lm loss: 2.983365E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 674.638 | TFLOPs: 31.49 | 7: iteration 81700/ 115203 | consumed samples: 20915200 | consumed tokens: 42834329600 | elapsed time per iteration (s): 0.38 | learning rate: 5.568E-05 | global batch size: 256 | lm loss: 2.981950E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 671.395 | TFLOPs: 31.34 | 7: iteration 81800/ 115203 | consumed samples: 20940800 | consumed tokens: 42886758400 | elapsed time per iteration (s): 0.38 | learning rate: 5.548E-05 | global batch size: 256 | lm loss: 2.980502E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 666.517 | TFLOPs: 31.11 | 7: iteration 81900/ 115203 | consumed samples: 20966400 | consumed tokens: 42939187200 | elapsed time per iteration (s): 0.38 | learning rate: 5.529E-05 | global batch size: 256 | lm loss: 2.983302E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 667.728 | TFLOPs: 31.17 | 0: [2023-03-17 03:32:21,244] [INFO] [logging.py:68:log_dist] [Rank 0] step=82000, skipped=0, lr=[5.5091074271143155e-05, 5.5091074271143155e-05, 5.5091074271143155e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 82000/ 115203 | consumed samples: 20992000 | consumed tokens: 42991616000 | elapsed time per iteration (s): 0.38 | learning rate: 5.509E-05 | global batch size: 256 | lm loss: 2.982803E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 667.869 | TFLOPs: 31.17 | 0: steps: 82000 loss: 3.0305 iter time (s): 0.379 samples/sec: 675.249 7: iteration 82100/ 115203 | consumed samples: 21017600 | consumed tokens: 43044044800 | elapsed time per iteration (s): 0.38 | learning rate: 5.489E-05 | global batch size: 256 | lm loss: 2.982245E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 672.192 | TFLOPs: 31.38 | 7: iteration 82200/ 115203 | consumed samples: 21043200 | consumed tokens: 43096473600 | elapsed time per iteration (s): 0.38 | learning rate: 5.470E-05 | global batch size: 256 | lm loss: 2.980832E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 673.928 | TFLOPs: 31.46 | 7: iteration 82300/ 115203 | consumed samples: 21068800 | consumed tokens: 43148902400 | elapsed time per iteration (s): 0.39 | learning rate: 5.450E-05 | global batch size: 256 | lm loss: 2.982529E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 660.980 | TFLOPs: 30.85 | 7: iteration 82400/ 115203 | consumed samples: 21094400 | consumed tokens: 43201331200 | elapsed time per iteration (s): 0.38 | learning rate: 5.431E-05 | global batch size: 256 | lm loss: 2.983750E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 674.044 | TFLOPs: 31.46 | 7: iteration 82500/ 115203 | consumed samples: 21120000 | consumed tokens: 43253760000 | elapsed time per iteration (s): 0.38 | learning rate: 5.411E-05 | global batch size: 256 | lm loss: 2.983672E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 670.671 | TFLOPs: 31.30 | 7: iteration 82600/ 115203 | consumed samples: 21145600 | consumed tokens: 43306188800 | elapsed time per iteration (s): 0.38 | learning rate: 5.392E-05 | global batch size: 256 | lm loss: 2.981022E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 671.014 | TFLOPs: 31.32 | 7: iteration 82700/ 115203 | consumed samples: 21171200 | consumed tokens: 43358617600 | elapsed time per iteration (s): 0.38 | learning rate: 5.373E-05 | global batch size: 256 | lm loss: 2.980114E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 671.270 | TFLOPs: 31.33 | 7: iteration 82800/ 115203 | consumed samples: 21196800 | consumed tokens: 43411046400 | elapsed time per iteration (s): 0.38 | learning rate: 5.353E-05 | global batch size: 256 | lm loss: 2.979977E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 673.836 | TFLOPs: 31.45 | 7: iteration 82900/ 115203 | consumed samples: 21222400 | consumed tokens: 43463475200 | elapsed time per iteration (s): 0.38 | learning rate: 5.334E-05 | global batch size: 256 | lm loss: 2.981185E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 670.035 | TFLOPs: 31.27 | 7: iteration 83000/ 115203 | consumed samples: 21248000 | consumed tokens: 43515904000 | elapsed time per iteration (s): 0.38 | learning rate: 5.315E-05 | global batch size: 256 | lm loss: 2.983398E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 670.932 | TFLOPs: 31.32 | 7: iteration 83100/ 115203 | consumed samples: 21273600 | consumed tokens: 43568332800 | elapsed time per iteration (s): 0.38 | learning rate: 5.296E-05 | global batch size: 256 | lm loss: 2.977886E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 666.285 | TFLOPs: 31.10 | 7: iteration 83200/ 115203 | consumed samples: 21299200 | consumed tokens: 43620761600 | elapsed time per iteration (s): 0.38 | learning rate: 5.276E-05 | global batch size: 256 | lm loss: 2.979141E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 673.617 | TFLOPs: 31.44 | 7: iteration 83300/ 115203 | consumed samples: 21324800 | consumed tokens: 43673190400 | elapsed time per iteration (s): 0.38 | learning rate: 5.257E-05 | global batch size: 256 | lm loss: 2.984587E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 677.869 | TFLOPs: 31.64 | 7: iteration 83400/ 115203 | consumed samples: 21350400 | consumed tokens: 43725619200 | elapsed time per iteration (s): 0.38 | learning rate: 5.238E-05 | global batch size: 256 | lm loss: 2.982956E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 668.768 | TFLOPs: 31.22 | 7: iteration 83500/ 115203 | consumed samples: 21376000 | consumed tokens: 43778048000 | elapsed time per iteration (s): 0.38 | learning rate: 5.219E-05 | global batch size: 256 | lm loss: 2.979109E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 678.276 | TFLOPs: 31.66 | 7: iteration 83600/ 115203 | consumed samples: 21401600 | consumed tokens: 43830476800 | elapsed time per iteration (s): 0.38 | learning rate: 5.200E-05 | global batch size: 256 | lm loss: 2.979095E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 678.891 | TFLOPs: 31.69 | 7: iteration 83700/ 115203 | consumed samples: 21427200 | consumed tokens: 43882905600 | elapsed time per iteration (s): 0.38 | learning rate: 5.181E-05 | global batch size: 256 | lm loss: 2.979442E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 680.245 | TFLOPs: 31.75 | 7: iteration 83800/ 115203 | consumed samples: 21452800 | consumed tokens: 43935334400 | elapsed time per iteration (s): 0.38 | learning rate: 5.162E-05 | global batch size: 256 | lm loss: 2.976125E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 680.028 | TFLOPs: 31.74 | 7: iteration 83900/ 115203 | consumed samples: 21478400 | consumed tokens: 43987763200 | elapsed time per iteration (s): 0.37 | learning rate: 5.144E-05 | global batch size: 256 | lm loss: 2.979236E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 683.076 | TFLOPs: 31.88 | 0: [2023-03-17 03:45:01,601] [INFO] [logging.py:68:log_dist] [Rank 0] step=84000, skipped=0, lr=[5.124789271253415e-05, 5.124789271253415e-05, 5.124789271253415e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 84000/ 115203 | consumed samples: 21504000 | consumed tokens: 44040192000 | elapsed time per iteration (s): 0.38 | learning rate: 5.125E-05 | global batch size: 256 | lm loss: 2.980167E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 672.208 | TFLOPs: 31.38 | 0: steps: 84000 loss: 2.9801 iter time (s): 0.378 samples/sec: 677.287 7: iteration 84100/ 115203 | consumed samples: 21529600 | consumed tokens: 44092620800 | elapsed time per iteration (s): 0.38 | learning rate: 5.106E-05 | global batch size: 256 | lm loss: 2.980221E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 676.858 | TFLOPs: 31.59 | 7: iteration 84200/ 115203 | consumed samples: 21555200 | consumed tokens: 44145049600 | elapsed time per iteration (s): 0.38 | learning rate: 5.087E-05 | global batch size: 256 | lm loss: 2.975824E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 679.948 | TFLOPs: 31.74 | 7: iteration 84300/ 115203 | consumed samples: 21580800 | consumed tokens: 44197478400 | elapsed time per iteration (s): 0.38 | learning rate: 5.069E-05 | global batch size: 256 | lm loss: 2.978568E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 678.942 | TFLOPs: 31.69 | 7: iteration 84400/ 115203 | consumed samples: 21606400 | consumed tokens: 44249907200 | elapsed time per iteration (s): 0.38 | learning rate: 5.050E-05 | global batch size: 256 | lm loss: 2.977287E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 676.090 | TFLOPs: 31.56 | 7: iteration 84500/ 115203 | consumed samples: 21632000 | consumed tokens: 44302336000 | elapsed time per iteration (s): 0.38 | learning rate: 5.031E-05 | global batch size: 256 | lm loss: 2.976412E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 679.195 | TFLOPs: 31.70 | 7: iteration 84600/ 115203 | consumed samples: 21657600 | consumed tokens: 44354764800 | elapsed time per iteration (s): 0.38 | learning rate: 5.013E-05 | global batch size: 256 | lm loss: 2.979340E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.343 | TFLOPs: 31.80 | 7: iteration 84700/ 115203 | consumed samples: 21683200 | consumed tokens: 44407193600 | elapsed time per iteration (s): 0.38 | learning rate: 4.994E-05 | global batch size: 256 | lm loss: 2.978045E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 678.323 | TFLOPs: 31.66 | 7: iteration 84800/ 115203 | consumed samples: 21708800 | consumed tokens: 44459622400 | elapsed time per iteration (s): 0.38 | learning rate: 4.976E-05 | global batch size: 256 | lm loss: 2.980385E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 677.338 | TFLOPs: 31.62 | 7: iteration 84900/ 115203 | consumed samples: 21734400 | consumed tokens: 44512051200 | elapsed time per iteration (s): 0.37 | learning rate: 4.958E-05 | global batch size: 256 | lm loss: 2.974605E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.682 | TFLOPs: 31.87 | 7: iteration 85000/ 115203 | consumed samples: 21760000 | consumed tokens: 44564480000 | elapsed time per iteration (s): 0.38 | learning rate: 4.939E-05 | global batch size: 256 | lm loss: 2.978444E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.263 | TFLOPs: 31.85 | 7: iteration 85100/ 115203 | consumed samples: 21785600 | consumed tokens: 44616908800 | elapsed time per iteration (s): 0.37 | learning rate: 4.921E-05 | global batch size: 256 | lm loss: 2.973560E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 683.575 | TFLOPs: 31.91 | 7: iteration 85200/ 115203 | consumed samples: 21811200 | consumed tokens: 44669337600 | elapsed time per iteration (s): 0.38 | learning rate: 4.903E-05 | global batch size: 256 | lm loss: 2.974979E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.928 | TFLOPs: 31.83 | 7: iteration 85300/ 115203 | consumed samples: 21836800 | consumed tokens: 44721766400 | elapsed time per iteration (s): 0.38 | learning rate: 4.884E-05 | global batch size: 256 | lm loss: 2.975335E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 680.092 | TFLOPs: 31.74 | 7: iteration 85400/ 115203 | consumed samples: 21862400 | consumed tokens: 44774195200 | elapsed time per iteration (s): 0.38 | learning rate: 4.866E-05 | global batch size: 256 | lm loss: 2.978415E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 680.543 | TFLOPs: 31.77 | 7: iteration 85500/ 115203 | consumed samples: 21888000 | consumed tokens: 44826624000 | elapsed time per iteration (s): 0.38 | learning rate: 4.848E-05 | global batch size: 256 | lm loss: 2.976263E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 680.447 | TFLOPs: 31.76 | 7: iteration 85600/ 115203 | consumed samples: 21913600 | consumed tokens: 44879052800 | elapsed time per iteration (s): 0.38 | learning rate: 4.830E-05 | global batch size: 256 | lm loss: 2.969589E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.053 | TFLOPs: 31.79 | 7: iteration 85700/ 115203 | consumed samples: 21939200 | consumed tokens: 44931481600 | elapsed time per iteration (s): 0.37 | learning rate: 4.812E-05 | global batch size: 256 | lm loss: 2.977551E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.802 | TFLOPs: 31.87 | 7: iteration 85800/ 115203 | consumed samples: 21964800 | consumed tokens: 44983910400 | elapsed time per iteration (s): 0.38 | learning rate: 4.794E-05 | global batch size: 256 | lm loss: 2.974294E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 679.819 | TFLOPs: 31.73 | 7: iteration 85900/ 115203 | consumed samples: 21990400 | consumed tokens: 45036339200 | elapsed time per iteration (s): 0.37 | learning rate: 4.776E-05 | global batch size: 256 | lm loss: 2.979622E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 683.762 | TFLOPs: 31.92 | 0: [2023-03-17 03:57:34,041] [INFO] [logging.py:68:log_dist] [Rank 0] step=86000, skipped=0, lr=[4.7582977310170454e-05, 4.7582977310170454e-05, 4.7582977310170454e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 86000/ 115203 | consumed samples: 22016000 | consumed tokens: 45088768000 | elapsed time per iteration (s): 0.38 | learning rate: 4.758E-05 | global batch size: 256 | lm loss: 2.977291E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.190 | TFLOPs: 31.84 | 0: steps: 86000 loss: 2.9418 iter time (s): 0.374 samples/sec: 684.277 7: iteration 86100/ 115203 | consumed samples: 22041600 | consumed tokens: 45141196800 | elapsed time per iteration (s): 0.38 | learning rate: 4.740E-05 | global batch size: 256 | lm loss: 2.978386E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 677.793 | TFLOPs: 31.64 | 7: iteration 86200/ 115203 | consumed samples: 22067200 | consumed tokens: 45193625600 | elapsed time per iteration (s): 0.38 | learning rate: 4.723E-05 | global batch size: 256 | lm loss: 2.972344E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 677.531 | TFLOPs: 31.62 | 7: iteration 86300/ 115203 | consumed samples: 22092800 | consumed tokens: 45246054400 | elapsed time per iteration (s): 0.38 | learning rate: 4.705E-05 | global batch size: 256 | lm loss: 2.981466E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 677.115 | TFLOPs: 31.61 | 7: iteration 86400/ 115203 | consumed samples: 22118400 | consumed tokens: 45298483200 | elapsed time per iteration (s): 0.38 | learning rate: 4.687E-05 | global batch size: 256 | lm loss: 2.976015E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 679.644 | TFLOPs: 31.72 | 7: iteration 86500/ 115203 | consumed samples: 22144000 | consumed tokens: 45350912000 | elapsed time per iteration (s): 0.38 | learning rate: 4.670E-05 | global batch size: 256 | lm loss: 2.974534E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 679.285 | TFLOPs: 31.71 | 7: iteration 86600/ 115203 | consumed samples: 22169600 | consumed tokens: 45403340800 | elapsed time per iteration (s): 0.38 | learning rate: 4.652E-05 | global batch size: 256 | lm loss: 2.976324E+00 | grad norm: 0.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 679.554 | TFLOPs: 31.72 | 7: iteration 86700/ 115203 | consumed samples: 22195200 | consumed tokens: 45455769600 | elapsed time per iteration (s): 0.38 | learning rate: 4.634E-05 | global batch size: 256 | lm loss: 2.972384E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 678.760 | TFLOPs: 31.68 | 7: iteration 86800/ 115203 | consumed samples: 22220800 | consumed tokens: 45508198400 | elapsed time per iteration (s): 0.38 | learning rate: 4.617E-05 | global batch size: 256 | lm loss: 2.977368E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 680.478 | TFLOPs: 31.76 | 7: iteration 86900/ 115203 | consumed samples: 22246400 | consumed tokens: 45560627200 | elapsed time per iteration (s): 0.38 | learning rate: 4.599E-05 | global batch size: 256 | lm loss: 2.975264E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.989 | TFLOPs: 31.83 | 7: iteration 87000/ 115203 | consumed samples: 22272000 | consumed tokens: 45613056000 | elapsed time per iteration (s): 0.38 | learning rate: 4.582E-05 | global batch size: 256 | lm loss: 2.969675E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 675.986 | TFLOPs: 31.55 | 7: iteration 87100/ 115203 | consumed samples: 22297600 | consumed tokens: 45665484800 | elapsed time per iteration (s): 0.38 | learning rate: 4.565E-05 | global batch size: 256 | lm loss: 2.971619E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 674.268 | TFLOPs: 31.47 | 7: iteration 87200/ 115203 | consumed samples: 22323200 | consumed tokens: 45717913600 | elapsed time per iteration (s): 0.38 | learning rate: 4.547E-05 | global batch size: 256 | lm loss: 2.974310E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 677.912 | TFLOPs: 31.64 | 7: iteration 87300/ 115203 | consumed samples: 22348800 | consumed tokens: 45770342400 | elapsed time per iteration (s): 0.38 | learning rate: 4.530E-05 | global batch size: 256 | lm loss: 2.971897E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 679.908 | TFLOPs: 31.74 | 7: iteration 87400/ 115203 | consumed samples: 22374400 | consumed tokens: 45822771200 | elapsed time per iteration (s): 0.38 | learning rate: 4.513E-05 | global batch size: 256 | lm loss: 2.975165E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 680.038 | TFLOPs: 31.74 | 7: iteration 87500/ 115203 | consumed samples: 22400000 | consumed tokens: 45875200000 | elapsed time per iteration (s): 0.38 | learning rate: 4.496E-05 | global batch size: 256 | lm loss: 2.972245E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 676.557 | TFLOPs: 31.58 | 7: iteration 87600/ 115203 | consumed samples: 22425600 | consumed tokens: 45927628800 | elapsed time per iteration (s): 0.38 | learning rate: 4.479E-05 | global batch size: 256 | lm loss: 2.976190E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 677.257 | TFLOPs: 31.61 | 7: iteration 87700/ 115203 | consumed samples: 22451200 | consumed tokens: 45980057600 | elapsed time per iteration (s): 0.38 | learning rate: 4.462E-05 | global batch size: 256 | lm loss: 2.974087E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 680.979 | TFLOPs: 31.79 | 7: iteration 87800/ 115203 | consumed samples: 22476800 | consumed tokens: 46032486400 | elapsed time per iteration (s): 0.38 | learning rate: 4.445E-05 | global batch size: 256 | lm loss: 2.969703E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 678.408 | TFLOPs: 31.67 | 7: iteration 87900/ 115203 | consumed samples: 22502400 | consumed tokens: 46084915200 | elapsed time per iteration (s): 0.38 | learning rate: 4.428E-05 | global batch size: 256 | lm loss: 2.970508E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 677.983 | TFLOPs: 31.65 | 0: [2023-03-17 04:10:08,708] [INFO] [logging.py:68:log_dist] [Rank 0] step=88000, skipped=0, lr=[4.410744818232367e-05, 4.410744818232367e-05, 4.410744818232367e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 88000/ 115203 | consumed samples: 22528000 | consumed tokens: 46137344000 | elapsed time per iteration (s): 0.38 | learning rate: 4.411E-05 | global batch size: 256 | lm loss: 2.969919E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 677.558 | TFLOPs: 31.63 | 0: steps: 88000 loss: 2.9387 iter time (s): 0.375 samples/sec: 682.185 7: iteration 88100/ 115203 | consumed samples: 22553600 | consumed tokens: 46189772800 | elapsed time per iteration (s): 0.38 | learning rate: 4.394E-05 | global batch size: 256 | lm loss: 2.971986E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 678.170 | TFLOPs: 31.65 | 7: iteration 88200/ 115203 | consumed samples: 22579200 | consumed tokens: 46242201600 | elapsed time per iteration (s): 0.38 | learning rate: 4.377E-05 | global batch size: 256 | lm loss: 2.969178E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 679.969 | TFLOPs: 31.74 | 7: iteration 88300/ 115203 | consumed samples: 22604800 | consumed tokens: 46294630400 | elapsed time per iteration (s): 0.38 | learning rate: 4.360E-05 | global batch size: 256 | lm loss: 2.972348E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 677.871 | TFLOPs: 31.64 | 7: iteration 88400/ 115203 | consumed samples: 22630400 | consumed tokens: 46347059200 | elapsed time per iteration (s): 0.38 | learning rate: 4.344E-05 | global batch size: 256 | lm loss: 2.970686E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 679.985 | TFLOPs: 31.74 | 7: iteration 88500/ 115203 | consumed samples: 22656000 | consumed tokens: 46399488000 | elapsed time per iteration (s): 0.38 | learning rate: 4.327E-05 | global batch size: 256 | lm loss: 2.974368E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 677.995 | TFLOPs: 31.65 | 7: iteration 88600/ 115203 | consumed samples: 22681600 | consumed tokens: 46451916800 | elapsed time per iteration (s): 0.38 | learning rate: 4.310E-05 | global batch size: 256 | lm loss: 2.967416E+00 | grad norm: 0.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 677.998 | TFLOPs: 31.65 | 7: iteration 88700/ 115203 | consumed samples: 22707200 | consumed tokens: 46504345600 | elapsed time per iteration (s): 0.38 | learning rate: 4.294E-05 | global batch size: 256 | lm loss: 2.969090E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 678.232 | TFLOPs: 31.66 | 7: iteration 88800/ 115203 | consumed samples: 22732800 | consumed tokens: 46556774400 | elapsed time per iteration (s): 0.38 | learning rate: 4.277E-05 | global batch size: 256 | lm loss: 2.968723E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.343 | TFLOPs: 31.80 | 7: iteration 88900/ 115203 | consumed samples: 22758400 | consumed tokens: 46609203200 | elapsed time per iteration (s): 0.38 | learning rate: 4.261E-05 | global batch size: 256 | lm loss: 2.970281E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 678.658 | TFLOPs: 31.68 | 7: iteration 89000/ 115203 | consumed samples: 22784000 | consumed tokens: 46661632000 | elapsed time per iteration (s): 0.38 | learning rate: 4.244E-05 | global batch size: 256 | lm loss: 2.971574E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 680.994 | TFLOPs: 31.79 | 7: iteration 89100/ 115203 | consumed samples: 22809600 | consumed tokens: 46714060800 | elapsed time per iteration (s): 0.38 | learning rate: 4.228E-05 | global batch size: 256 | lm loss: 2.969548E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 680.631 | TFLOPs: 31.77 | 7: iteration 89200/ 115203 | consumed samples: 22835200 | consumed tokens: 46766489600 | elapsed time per iteration (s): 0.38 | learning rate: 4.212E-05 | global batch size: 256 | lm loss: 2.972719E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.284 | TFLOPs: 31.80 | 7: iteration 89300/ 115203 | consumed samples: 22860800 | consumed tokens: 46818918400 | elapsed time per iteration (s): 0.38 | learning rate: 4.195E-05 | global batch size: 256 | lm loss: 2.968537E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.294 | TFLOPs: 31.85 | 7: iteration 89400/ 115203 | consumed samples: 22886400 | consumed tokens: 46871347200 | elapsed time per iteration (s): 0.37 | learning rate: 4.179E-05 | global batch size: 256 | lm loss: 2.971944E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 683.060 | TFLOPs: 31.88 | 7: iteration 89500/ 115203 | consumed samples: 22912000 | consumed tokens: 46923776000 | elapsed time per iteration (s): 0.38 | learning rate: 4.163E-05 | global batch size: 256 | lm loss: 2.970055E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 678.299 | TFLOPs: 31.66 | 7: iteration 89600/ 115203 | consumed samples: 22937600 | consumed tokens: 46976204800 | elapsed time per iteration (s): 0.38 | learning rate: 4.147E-05 | global batch size: 256 | lm loss: 2.970498E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 680.465 | TFLOPs: 31.76 | 7: iteration 89700/ 115203 | consumed samples: 22963200 | consumed tokens: 47028633600 | elapsed time per iteration (s): 0.37 | learning rate: 4.131E-05 | global batch size: 256 | lm loss: 2.971291E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.966 | TFLOPs: 31.88 | 7: iteration 89800/ 115203 | consumed samples: 22988800 | consumed tokens: 47081062400 | elapsed time per iteration (s): 0.38 | learning rate: 4.115E-05 | global batch size: 256 | lm loss: 2.972499E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 678.168 | TFLOPs: 31.65 | 7: iteration 89900/ 115203 | consumed samples: 23014400 | consumed tokens: 47133491200 | elapsed time per iteration (s): 0.38 | learning rate: 4.099E-05 | global batch size: 256 | lm loss: 2.968105E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 679.736 | TFLOPs: 31.73 | 0: [2023-03-17 04:22:41,866] [INFO] [logging.py:68:log_dist] [Rank 0] step=90000, skipped=0, lr=[4.083185080977982e-05, 4.083185080977982e-05, 4.083185080977982e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 90000/ 115203 | consumed samples: 23040000 | consumed tokens: 47185920000 | elapsed time per iteration (s): 0.38 | learning rate: 4.083E-05 | global batch size: 256 | lm loss: 2.969124E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 678.052 | TFLOPs: 31.65 | 0: steps: 90000 loss: 2.9543 iter time (s): 0.374 samples/sec: 683.676 7: ------------------------------------------------------------------------------------------------ 7: validation loss at iteration 90000 | lm loss value: 3.825497E+00 | lm loss PPL: 4.585560E+01 | 7: ------------------------------------------------------------------------------------------------ 0: saving checkpoint at iteration 90000 to checkpoints_146m60b100m 0: [2023-03-17 04:22:41,992] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step90000 is begin to save! 0: [2023-03-17 04:22:41,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step90000/layer_01-model_00-model_states.pt... 0: [2023-03-17 04:22:42,091] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step90000/layer_01-model_00-model_states.pt. 0: [2023-03-17 04:22:42,091] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step90000/layer_03-model_00-model_states.pt... 0: [2023-03-17 04:22:42,107] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step90000/layer_03-model_00-model_states.pt. 0: [2023-03-17 04:22:42,107] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step90000/layer_04-model_00-model_states.pt... 0: [2023-03-17 04:22:42,122] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step90000/layer_04-model_00-model_states.pt. 0: [2023-03-17 04:22:42,122] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step90000/layer_05-model_00-model_states.pt... 0: [2023-03-17 04:22:42,137] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step90000/layer_05-model_00-model_states.pt. 0: [2023-03-17 04:22:42,137] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step90000/layer_06-model_00-model_states.pt... 0: [2023-03-17 04:22:42,152] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step90000/layer_06-model_00-model_states.pt. 0: [2023-03-17 04:22:42,152] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step90000/layer_07-model_00-model_states.pt... 0: [2023-03-17 04:22:42,166] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step90000/layer_07-model_00-model_states.pt. 0: [2023-03-17 04:22:42,167] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step90000/layer_08-model_00-model_states.pt... 0: [2023-03-17 04:22:42,181] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step90000/layer_08-model_00-model_states.pt. 0: [2023-03-17 04:22:42,182] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step90000/layer_09-model_00-model_states.pt... 0: [2023-03-17 04:22:42,196] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step90000/layer_09-model_00-model_states.pt. 0: [2023-03-17 04:22:42,196] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step90000/layer_10-model_00-model_states.pt... 0: [2023-03-17 04:22:42,211] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step90000/layer_10-model_00-model_states.pt. 0: [2023-03-17 04:22:42,211] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step90000/layer_11-model_00-model_states.pt... 0: [2023-03-17 04:22:42,226] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step90000/layer_11-model_00-model_states.pt. 0: [2023-03-17 04:22:42,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step90000/layer_12-model_00-model_states.pt... 0: [2023-03-17 04:22:42,241] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step90000/layer_12-model_00-model_states.pt. 0: [2023-03-17 04:22:42,241] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step90000/layer_13-model_00-model_states.pt... 0: [2023-03-17 04:22:42,256] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step90000/layer_13-model_00-model_states.pt. 0: [2023-03-17 04:22:42,256] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step90000/layer_14-model_00-model_states.pt... 0: [2023-03-17 04:22:42,270] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step90000/layer_14-model_00-model_states.pt. 0: [2023-03-17 04:22:42,271] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step90000/layer_15-model_00-model_states.pt... 0: [2023-03-17 04:22:42,285] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step90000/layer_15-model_00-model_states.pt. 0: [2023-03-17 04:22:42,286] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step90000/layer_16-model_00-model_states.pt... 0: [2023-03-17 04:22:42,300] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step90000/layer_16-model_00-model_states.pt. 0: [2023-03-17 04:22:42,300] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step90000/layer_17-model_00-model_states.pt... 0: [2023-03-17 04:22:42,315] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step90000/layer_17-model_00-model_states.pt. 0: [2023-03-17 04:22:42,315] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step90000/layer_19-model_00-model_states.pt... 0: [2023-03-17 04:22:42,316] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step90000/layer_19-model_00-model_states.pt. 0: [2023-03-17 04:22:42,317] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_146m60b100m/global_step90000/mp_rank_00_model_states.pt 0: [2023-03-17 04:22:42,317] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step90000/mp_rank_00_model_states.pt... 0: [2023-03-17 04:22:42,321] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step90000/mp_rank_00_model_states.pt. 0: [2023-03-17 04:22:42,339] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:22:42,339] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:22:42,339] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:22:42,339] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:22:42,339] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:22:42,339] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:22:42,339] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:22:42,339] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:22:42,339] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:22:42,339] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:22:42,339] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:22:42,339] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:22:42,339] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:22:42,339] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:22:42,339] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:22:42,339] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:22:42,339] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:22:42,339] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:22:42,339] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:22:42,339] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:22:42,339] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:22:42,339] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:22:42,339] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:22:42,339] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:22:42,339] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:22:42,339] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:22:42,339] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:22:42,339] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:22:42,339] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:22:42,339] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:22:42,339] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:22:42,339] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:22:42,339] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:22:42,339] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:22:42,339] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:22:42,339] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:22:42,339] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:22:42,339] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:22:42,339] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:22:42,339] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:22:42,339] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:22:42,339] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:22:42,339] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:22:42,339] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:22:42,339] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:22:42,339] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:22:42,339] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:22:42,339] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:22:42,339] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:22:42,339] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:22:42,339] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:22:42,339] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:22:42,339] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:22:42,339] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:22:42,339] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:22:42,339] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:22:42,339] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:22:42,339] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:22:42,339] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:22:42,339] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:22:42,339] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:22:42,339] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:22:42,339] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:22:42,339] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:22:42,374] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:22:42,373] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:22:42,373] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:22:42,374] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-17 04:22:42,374] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-17 04:22:42,374] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 6: [2023-03-17 04:22:42,374] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 4: [2023-03-17 04:22:42,375] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:22:42,375] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:22:42,375] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-03-17 04:22:42,375] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 0: [2023-03-17 04:22:42,375] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-17 04:22:42,375] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:22:42,375] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 0: [2023-03-17 04:22:42,375] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-17 04:22:42,376] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 0: [2023-03-17 04:22:42,376] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:22:42,376] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-03-17 04:22:42,376] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 0: [2023-03-17 04:22:42,377] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:22:42,377] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-17 04:22:42,377] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 5: [2023-03-17 04:22:42,377] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:22:42,377] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-17 04:22:42,377] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 5: [2023-03-17 04:22:42,378] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:22:42,378] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-03-17 04:22:42,378] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 6: [2023-03-17 04:22:42,378] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:22:42,378] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2023-03-17 04:22:42,378] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 6: [2023-03-17 04:22:42,379] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:22:42,379] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 4: [2023-03-17 04:22:42,379] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:22:42,379] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:22:42,379] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:22:42,379] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 4: [2023-03-17 04:22:42,379] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2023-03-17 04:22:42,379] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-03-17 04:22:42,379] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 04:22:42,379] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 4: [2023-03-17 04:22:42,379] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 4: [2023-03-17 04:22:42,379] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 4: [2023-03-17 04:22:42,379] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:22:42,379] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 04:22:42,379] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 4: [2023-03-17 04:22:42,379] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:22:42,379] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 04:22:42,379] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 6: [2023-03-17 04:22:42,379] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:22:42,379] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:22:42,379] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 6: [2023-03-17 04:22:42,379] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 5: [2023-03-17 04:22:42,379] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 6: [2023-03-17 04:22:42,380] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 4: [2023-03-17 04:22:42,380] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:22:42,380] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 04:22:42,380] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 4: [2023-03-17 04:22:42,380] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:22:42,380] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2023-03-17 04:22:42,380] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 5: [2023-03-17 04:22:42,380] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:22:42,380] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-03-17 04:22:42,380] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 6: [2023-03-17 04:22:42,380] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:22:42,380] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:22:42,381] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-03-17 04:22:42,381] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 04:22:42,381] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 6: [2023-03-17 04:22:42,381] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 0: [2023-03-17 04:22:42,381] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:22:42,381] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:22:42,381] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-03-17 04:22:42,381] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:22:42,381] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 0: [2023-03-17 04:22:42,381] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 5: [2023-03-17 04:22:42,381] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 0: [2023-03-17 04:22:42,381] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-03-17 04:22:42,381] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 0: [2023-03-17 04:22:42,381] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:22:42,381] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:22:42,381] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 5: [2023-03-17 04:22:42,381] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 0: [2023-03-17 04:22:42,381] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 5: [2023-03-17 04:22:42,381] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 6: [2023-03-17 04:22:42,381] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:22:42,381] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 04:22:42,381] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 5: [2023-03-17 04:22:42,381] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:22:42,382] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-03-17 04:22:42,382] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 5: [2023-03-17 04:22:42,390] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:22:42,390] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2023-03-17 04:22:42,390] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 7: [2023-03-17 04:22:42,403] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:22:42,403] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:22:42,403] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:22:42,403] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:22:42,403] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:22:42,403] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:22:42,403] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:22:42,403] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-03-17 04:22:42,403] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-03-17 04:22:42,403] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-03-17 04:22:42,403] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-03-17 04:22:42,403] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 04:22:42,403] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 0: [2023-03-17 04:22:42,403] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 7: [2023-03-17 04:22:42,403] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 0: [2023-03-17 04:22:42,403] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 7: [2023-03-17 04:22:42,403] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 7: [2023-03-17 04:22:42,403] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 7: [2023-03-17 04:22:42,403] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 7: [2023-03-17 04:22:42,403] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 7: [2023-03-17 04:22:42,403] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 7: [2023-03-17 04:22:42,403] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 7: [2023-03-17 04:22:42,403] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 7: [2023-03-17 04:22:42,403] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:22:42,403] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-03-17 04:22:42,403] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 1: [2023-03-17 04:22:42,419] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:22:42,419] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:22:42,419] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:22:42,419] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:22:42,419] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:22:42,419] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:22:42,419] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-17 04:22:42,419] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-03-17 04:22:42,419] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:22:42,419] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:22:42,419] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2023-03-17 04:22:42,419] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 1: [2023-03-17 04:22:42,419] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 1: [2023-03-17 04:22:42,419] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 04:22:42,419] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-03-17 04:22:42,419] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 04:22:42,419] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2023-03-17 04:22:42,419] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-03-17 04:22:42,419] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 1: [2023-03-17 04:22:42,419] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 1: [2023-03-17 04:22:42,419] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 1: [2023-03-17 04:22:42,419] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 1: [2023-03-17 04:22:42,419] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 1: [2023-03-17 04:22:42,419] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 3: [2023-03-17 04:22:42,422] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:22:42,422] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:22:42,422] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:22:42,422] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:22:42,422] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-17 04:22:42,422] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2023-03-17 04:22:42,422] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:22:42,422] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:22:42,422] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-03-17 04:22:42,422] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-17 04:22:42,422] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:22:42,422] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 3: [2023-03-17 04:22:42,422] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 3: [2023-03-17 04:22:42,422] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 3: [2023-03-17 04:22:42,422] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 3: [2023-03-17 04:22:42,422] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-03-17 04:22:42,422] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 04:22:42,422] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2023-03-17 04:22:42,422] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 3: [2023-03-17 04:22:42,422] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 3: [2023-03-17 04:22:42,422] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 3: [2023-03-17 04:22:42,424] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:22:42,424] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-17 04:22:42,424] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 2: [2023-03-17 04:22:42,434] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:22:42,434] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:22:42,434] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:22:42,434] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:22:42,434] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:22:42,434] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:22:42,434] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:22:42,434] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:22:42,434] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 04:22:42,434] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2023-03-17 04:22:42,434] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-03-17 04:22:42,434] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 04:22:42,434] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2023-03-17 04:22:42,434] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-03-17 04:22:42,434] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-03-17 04:22:42,434] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step90000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-03-17 04:22:42,434] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 2: [2023-03-17 04:22:42,434] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 2: [2023-03-17 04:22:42,434] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 2: [2023-03-17 04:22:42,434] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 2: [2023-03-17 04:22:42,434] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 2: [2023-03-17 04:22:42,434] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 2: [2023-03-17 04:22:42,434] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 2: [2023-03-17 04:22:42,434] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 0: successfully saved checkpoint at iteration 90000 to checkpoints_146m60b100m 7: time (ms) | save-checkpoint: 449.89 7: iteration 90100/ 115203 | consumed samples: 23065600 | consumed tokens: 47238348800 | elapsed time per iteration (s): 0.38 | learning rate: 4.067E-05 | global batch size: 256 | lm loss: 2.969516E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 669.879 | TFLOPs: 31.27 | 7: iteration 90200/ 115203 | consumed samples: 23091200 | consumed tokens: 47290777600 | elapsed time per iteration (s): 0.38 | learning rate: 4.052E-05 | global batch size: 256 | lm loss: 2.971395E+00 | grad norm: 0.270 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 677.795 | TFLOPs: 31.64 | 7: iteration 90300/ 115203 | consumed samples: 23116800 | consumed tokens: 47343206400 | elapsed time per iteration (s): 0.38 | learning rate: 4.036E-05 | global batch size: 256 | lm loss: 2.971581E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 678.130 | TFLOPs: 31.65 | 7: iteration 90400/ 115203 | consumed samples: 23142400 | consumed tokens: 47395635200 | elapsed time per iteration (s): 0.38 | learning rate: 4.020E-05 | global batch size: 256 | lm loss: 2.968227E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 670.949 | TFLOPs: 31.32 | 7: iteration 90500/ 115203 | consumed samples: 23168000 | consumed tokens: 47448064000 | elapsed time per iteration (s): 0.38 | learning rate: 4.005E-05 | global batch size: 256 | lm loss: 2.971795E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 676.691 | TFLOPs: 31.59 | 7: iteration 90600/ 115203 | consumed samples: 23193600 | consumed tokens: 47500492800 | elapsed time per iteration (s): 0.38 | learning rate: 3.989E-05 | global batch size: 256 | lm loss: 2.969279E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 671.738 | TFLOPs: 31.35 | 7: iteration 90700/ 115203 | consumed samples: 23219200 | consumed tokens: 47552921600 | elapsed time per iteration (s): 0.38 | learning rate: 3.973E-05 | global batch size: 256 | lm loss: 2.970032E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 669.196 | TFLOPs: 31.24 | 7: iteration 90800/ 115203 | consumed samples: 23244800 | consumed tokens: 47605350400 | elapsed time per iteration (s): 0.38 | learning rate: 3.958E-05 | global batch size: 256 | lm loss: 2.968813E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 668.991 | TFLOPs: 31.23 | 7: iteration 90900/ 115203 | consumed samples: 23270400 | consumed tokens: 47657779200 | elapsed time per iteration (s): 0.38 | learning rate: 3.943E-05 | global batch size: 256 | lm loss: 2.969896E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 672.157 | TFLOPs: 31.37 | 7: iteration 91000/ 115203 | consumed samples: 23296000 | consumed tokens: 47710208000 | elapsed time per iteration (s): 0.38 | learning rate: 3.927E-05 | global batch size: 256 | lm loss: 2.966942E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 668.412 | TFLOPs: 31.20 | 7: iteration 91100/ 115203 | consumed samples: 23321600 | consumed tokens: 47762636800 | elapsed time per iteration (s): 0.38 | learning rate: 3.912E-05 | global batch size: 256 | lm loss: 2.966625E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 669.640 | TFLOPs: 31.26 | 7: iteration 91200/ 115203 | consumed samples: 23347200 | consumed tokens: 47815065600 | elapsed time per iteration (s): 0.38 | learning rate: 3.897E-05 | global batch size: 256 | lm loss: 2.971026E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 671.873 | TFLOPs: 31.36 | 7: iteration 91300/ 115203 | consumed samples: 23372800 | consumed tokens: 47867494400 | elapsed time per iteration (s): 0.38 | learning rate: 3.881E-05 | global batch size: 256 | lm loss: 2.966889E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 675.954 | TFLOPs: 31.55 | 7: iteration 91400/ 115203 | consumed samples: 23398400 | consumed tokens: 47919923200 | elapsed time per iteration (s): 0.38 | learning rate: 3.866E-05 | global batch size: 256 | lm loss: 2.961689E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 670.923 | TFLOPs: 31.32 | 7: iteration 91500/ 115203 | consumed samples: 23424000 | consumed tokens: 47972352000 | elapsed time per iteration (s): 0.38 | learning rate: 3.851E-05 | global batch size: 256 | lm loss: 2.970804E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 670.299 | TFLOPs: 31.29 | 7: iteration 91600/ 115203 | consumed samples: 23449600 | consumed tokens: 48024780800 | elapsed time per iteration (s): 0.38 | learning rate: 3.836E-05 | global batch size: 256 | lm loss: 2.968504E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 669.390 | TFLOPs: 31.24 | 7: iteration 91700/ 115203 | consumed samples: 23475200 | consumed tokens: 48077209600 | elapsed time per iteration (s): 0.38 | learning rate: 3.821E-05 | global batch size: 256 | lm loss: 2.966366E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 673.139 | TFLOPs: 31.42 | 7: iteration 91800/ 115203 | consumed samples: 23500800 | consumed tokens: 48129638400 | elapsed time per iteration (s): 0.38 | learning rate: 3.806E-05 | global batch size: 256 | lm loss: 2.964622E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 675.109 | TFLOPs: 31.51 | 7: iteration 91900/ 115203 | consumed samples: 23526400 | consumed tokens: 48182067200 | elapsed time per iteration (s): 0.38 | learning rate: 3.791E-05 | global batch size: 256 | lm loss: 2.967777E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 667.855 | TFLOPs: 31.17 | 0: [2023-03-17 04:35:24,043] [INFO] [logging.py:68:log_dist] [Rank 0] step=92000, skipped=0, lr=[3.776612403864962e-05, 3.776612403864962e-05, 3.776612403864962e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 92000/ 115203 | consumed samples: 23552000 | consumed tokens: 48234496000 | elapsed time per iteration (s): 0.38 | learning rate: 3.777E-05 | global batch size: 256 | lm loss: 2.963540E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 667.400 | TFLOPs: 31.15 | 0: steps: 92000 loss: 2.9605 iter time (s): 0.379 samples/sec: 675.322 7: iteration 92100/ 115203 | consumed samples: 23577600 | consumed tokens: 48286924800 | elapsed time per iteration (s): 0.38 | learning rate: 3.762E-05 | global batch size: 256 | lm loss: 2.965129E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 668.927 | TFLOPs: 31.22 | 7: iteration 92200/ 115203 | consumed samples: 23603200 | consumed tokens: 48339353600 | elapsed time per iteration (s): 0.38 | learning rate: 3.747E-05 | global batch size: 256 | lm loss: 2.967672E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 665.989 | TFLOPs: 31.09 | 7: iteration 92300/ 115203 | consumed samples: 23628800 | consumed tokens: 48391782400 | elapsed time per iteration (s): 0.38 | learning rate: 3.732E-05 | global batch size: 256 | lm loss: 2.964334E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 665.076 | TFLOPs: 31.04 | 7: iteration 92400/ 115203 | consumed samples: 23654400 | consumed tokens: 48444211200 | elapsed time per iteration (s): 0.38 | learning rate: 3.718E-05 | global batch size: 256 | lm loss: 2.966584E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 664.948 | TFLOPs: 31.04 | 7: iteration 92500/ 115203 | consumed samples: 23680000 | consumed tokens: 48496640000 | elapsed time per iteration (s): 0.38 | learning rate: 3.703E-05 | global batch size: 256 | lm loss: 2.970874E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 667.664 | TFLOPs: 31.16 | 7: iteration 92600/ 115203 | consumed samples: 23705600 | consumed tokens: 48549068800 | elapsed time per iteration (s): 0.39 | learning rate: 3.689E-05 | global batch size: 256 | lm loss: 2.967797E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 656.922 | TFLOPs: 30.66 | 7: iteration 92700/ 115203 | consumed samples: 23731200 | consumed tokens: 48601497600 | elapsed time per iteration (s): 0.39 | learning rate: 3.674E-05 | global batch size: 256 | lm loss: 2.967563E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 660.390 | TFLOPs: 30.82 | 7: iteration 92800/ 115203 | consumed samples: 23756800 | consumed tokens: 48653926400 | elapsed time per iteration (s): 0.39 | learning rate: 3.660E-05 | global batch size: 256 | lm loss: 2.964209E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 658.989 | TFLOPs: 30.76 | 7: iteration 92900/ 115203 | consumed samples: 23782400 | consumed tokens: 48706355200 | elapsed time per iteration (s): 0.39 | learning rate: 3.646E-05 | global batch size: 256 | lm loss: 2.965073E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 660.449 | TFLOPs: 30.83 | 7: iteration 93000/ 115203 | consumed samples: 23808000 | consumed tokens: 48758784000 | elapsed time per iteration (s): 0.39 | learning rate: 3.631E-05 | global batch size: 256 | lm loss: 2.967580E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 661.471 | TFLOPs: 30.88 | 7: iteration 93100/ 115203 | consumed samples: 23833600 | consumed tokens: 48811212800 | elapsed time per iteration (s): 0.39 | learning rate: 3.617E-05 | global batch size: 256 | lm loss: 2.966946E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 657.782 | TFLOPs: 30.70 | 7: iteration 93200/ 115203 | consumed samples: 23859200 | consumed tokens: 48863641600 | elapsed time per iteration (s): 0.39 | learning rate: 3.603E-05 | global batch size: 256 | lm loss: 2.964002E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 661.963 | TFLOPs: 30.90 | 7: iteration 93300/ 115203 | consumed samples: 23884800 | consumed tokens: 48916070400 | elapsed time per iteration (s): 0.39 | learning rate: 3.589E-05 | global batch size: 256 | lm loss: 2.965819E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 662.987 | TFLOPs: 30.95 | 7: iteration 93400/ 115203 | consumed samples: 23910400 | consumed tokens: 48968499200 | elapsed time per iteration (s): 0.39 | learning rate: 3.575E-05 | global batch size: 256 | lm loss: 2.967896E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 662.815 | TFLOPs: 30.94 | 7: iteration 93500/ 115203 | consumed samples: 23936000 | consumed tokens: 49020928000 | elapsed time per iteration (s): 0.38 | learning rate: 3.561E-05 | global batch size: 256 | lm loss: 2.967105E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 667.333 | TFLOPs: 31.15 | 7: iteration 93600/ 115203 | consumed samples: 23961600 | consumed tokens: 49073356800 | elapsed time per iteration (s): 0.38 | learning rate: 3.547E-05 | global batch size: 256 | lm loss: 2.965631E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 669.054 | TFLOPs: 31.23 | 7: iteration 93700/ 115203 | consumed samples: 23987200 | consumed tokens: 49125785600 | elapsed time per iteration (s): 0.38 | learning rate: 3.533E-05 | global batch size: 256 | lm loss: 2.960018E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 667.135 | TFLOPs: 31.14 | 7: iteration 93800/ 115203 | consumed samples: 24012800 | consumed tokens: 49178214400 | elapsed time per iteration (s): 0.38 | learning rate: 3.519E-05 | global batch size: 256 | lm loss: 2.965500E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 669.533 | TFLOPs: 31.25 | 7: iteration 93900/ 115203 | consumed samples: 24038400 | consumed tokens: 49230643200 | elapsed time per iteration (s): 0.38 | learning rate: 3.506E-05 | global batch size: 256 | lm loss: 2.966198E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 670.726 | TFLOPs: 31.31 | 0: [2023-03-17 04:48:14,675] [INFO] [logging.py:68:log_dist] [Rank 0] step=94000, skipped=0, lr=[3.4919569923835e-05, 3.4919569923835e-05, 3.4919569923835e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 94000/ 115203 | consumed samples: 24064000 | consumed tokens: 49283072000 | elapsed time per iteration (s): 0.38 | learning rate: 3.492E-05 | global batch size: 256 | lm loss: 2.965629E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 667.731 | TFLOPs: 31.17 | 0: steps: 94000 loss: 2.9643 iter time (s): 0.383 samples/sec: 668.014 7: iteration 94100/ 115203 | consumed samples: 24089600 | consumed tokens: 49335500800 | elapsed time per iteration (s): 0.38 | learning rate: 3.478E-05 | global batch size: 256 | lm loss: 2.966282E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 668.399 | TFLOPs: 31.20 | 7: iteration 94200/ 115203 | consumed samples: 24115200 | consumed tokens: 49387929600 | elapsed time per iteration (s): 0.38 | learning rate: 3.465E-05 | global batch size: 256 | lm loss: 2.964064E+00 | grad norm: 0.271 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 669.502 | TFLOPs: 31.25 | 7: iteration 94300/ 115203 | consumed samples: 24140800 | consumed tokens: 49440358400 | elapsed time per iteration (s): 0.38 | learning rate: 3.451E-05 | global batch size: 256 | lm loss: 2.965406E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 672.854 | TFLOPs: 31.41 | 7: iteration 94400/ 115203 | consumed samples: 24166400 | consumed tokens: 49492787200 | elapsed time per iteration (s): 0.38 | learning rate: 3.438E-05 | global batch size: 256 | lm loss: 2.960623E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 672.741 | TFLOPs: 31.40 | 7: iteration 94500/ 115203 | consumed samples: 24192000 | consumed tokens: 49545216000 | elapsed time per iteration (s): 0.39 | learning rate: 3.424E-05 | global batch size: 256 | lm loss: 2.961581E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 663.351 | TFLOPs: 30.96 | 7: iteration 94600/ 115203 | consumed samples: 24217600 | consumed tokens: 49597644800 | elapsed time per iteration (s): 0.38 | learning rate: 3.411E-05 | global batch size: 256 | lm loss: 2.965498E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 670.040 | TFLOPs: 31.27 | 7: iteration 94700/ 115203 | consumed samples: 24243200 | consumed tokens: 49650073600 | elapsed time per iteration (s): 0.38 | learning rate: 3.398E-05 | global batch size: 256 | lm loss: 2.961962E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 676.689 | TFLOPs: 31.59 | 7: iteration 94800/ 115203 | consumed samples: 24268800 | consumed tokens: 49702502400 | elapsed time per iteration (s): 0.38 | learning rate: 3.384E-05 | global batch size: 256 | lm loss: 2.961717E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 669.601 | TFLOPs: 31.25 | 7: iteration 94900/ 115203 | consumed samples: 24294400 | consumed tokens: 49754931200 | elapsed time per iteration (s): 0.38 | learning rate: 3.371E-05 | global batch size: 256 | lm loss: 2.962577E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 674.895 | TFLOPs: 31.50 | 7: iteration 95000/ 115203 | consumed samples: 24320000 | consumed tokens: 49807360000 | elapsed time per iteration (s): 0.38 | learning rate: 3.358E-05 | global batch size: 256 | lm loss: 2.962385E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 675.687 | TFLOPs: 31.54 | 7: iteration 95100/ 115203 | consumed samples: 24345600 | consumed tokens: 49859788800 | elapsed time per iteration (s): 0.38 | learning rate: 3.345E-05 | global batch size: 256 | lm loss: 2.965066E+00 | grad norm: 0.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 674.663 | TFLOPs: 31.49 | 7: iteration 95200/ 115203 | consumed samples: 24371200 | consumed tokens: 49912217600 | elapsed time per iteration (s): 0.38 | learning rate: 3.332E-05 | global batch size: 256 | lm loss: 2.965816E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 677.167 | TFLOPs: 31.61 | 7: iteration 95300/ 115203 | consumed samples: 24396800 | consumed tokens: 49964646400 | elapsed time per iteration (s): 0.38 | learning rate: 3.319E-05 | global batch size: 256 | lm loss: 2.961986E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 673.331 | TFLOPs: 31.43 | 7: iteration 95400/ 115203 | consumed samples: 24422400 | consumed tokens: 50017075200 | elapsed time per iteration (s): 0.38 | learning rate: 3.306E-05 | global batch size: 256 | lm loss: 2.959601E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 676.231 | TFLOPs: 31.56 | 7: iteration 95500/ 115203 | consumed samples: 24448000 | consumed tokens: 50069504000 | elapsed time per iteration (s): 0.38 | learning rate: 3.293E-05 | global batch size: 256 | lm loss: 2.963691E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 675.022 | TFLOPs: 31.51 | 7: iteration 95600/ 115203 | consumed samples: 24473600 | consumed tokens: 50121932800 | elapsed time per iteration (s): 0.38 | learning rate: 3.281E-05 | global batch size: 256 | lm loss: 2.960362E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 675.938 | TFLOPs: 31.55 | 7: iteration 95700/ 115203 | consumed samples: 24499200 | consumed tokens: 50174361600 | elapsed time per iteration (s): 0.38 | learning rate: 3.268E-05 | global batch size: 256 | lm loss: 2.963598E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 674.563 | TFLOPs: 31.49 | 7: iteration 95800/ 115203 | consumed samples: 24524800 | consumed tokens: 50226790400 | elapsed time per iteration (s): 0.38 | learning rate: 3.255E-05 | global batch size: 256 | lm loss: 2.964760E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 677.786 | TFLOPs: 31.64 | 7: iteration 95900/ 115203 | consumed samples: 24550400 | consumed tokens: 50279219200 | elapsed time per iteration (s): 0.38 | learning rate: 3.243E-05 | global batch size: 256 | lm loss: 2.966395E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 677.573 | TFLOPs: 31.63 | 0: [2023-03-17 05:00:54,848] [INFO] [logging.py:68:log_dist] [Rank 0] step=96000, skipped=0, lr=[3.230082550465275e-05, 3.230082550465275e-05, 3.230082550465275e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 96000/ 115203 | consumed samples: 24576000 | consumed tokens: 50331648000 | elapsed time per iteration (s): 0.38 | learning rate: 3.230E-05 | global batch size: 256 | lm loss: 2.958853E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 675.395 | TFLOPs: 31.52 | 0: steps: 96000 loss: 2.9463 iter time (s): 0.378 samples/sec: 677.157 7: iteration 96100/ 115203 | consumed samples: 24601600 | consumed tokens: 50384076800 | elapsed time per iteration (s): 0.38 | learning rate: 3.218E-05 | global batch size: 256 | lm loss: 2.962504E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 679.160 | TFLOPs: 31.70 | 7: iteration 96200/ 115203 | consumed samples: 24627200 | consumed tokens: 50436505600 | elapsed time per iteration (s): 0.38 | learning rate: 3.205E-05 | global batch size: 256 | lm loss: 2.960190E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 676.784 | TFLOPs: 31.59 | 7: iteration 96300/ 115203 | consumed samples: 24652800 | consumed tokens: 50488934400 | elapsed time per iteration (s): 0.38 | learning rate: 3.193E-05 | global batch size: 256 | lm loss: 2.963507E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 677.681 | TFLOPs: 31.63 | 7: iteration 96400/ 115203 | consumed samples: 24678400 | consumed tokens: 50541363200 | elapsed time per iteration (s): 0.38 | learning rate: 3.181E-05 | global batch size: 256 | lm loss: 2.961212E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 672.554 | TFLOPs: 31.39 | 7: iteration 96500/ 115203 | consumed samples: 24704000 | consumed tokens: 50593792000 | elapsed time per iteration (s): 0.38 | learning rate: 3.168E-05 | global batch size: 256 | lm loss: 2.964447E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 675.131 | TFLOPs: 31.51 | 7: iteration 96600/ 115203 | consumed samples: 24729600 | consumed tokens: 50646220800 | elapsed time per iteration (s): 0.38 | learning rate: 3.156E-05 | global batch size: 256 | lm loss: 2.959850E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 674.405 | TFLOPs: 31.48 | 7: iteration 96700/ 115203 | consumed samples: 24755200 | consumed tokens: 50698649600 | elapsed time per iteration (s): 0.38 | learning rate: 3.144E-05 | global batch size: 256 | lm loss: 2.963264E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 679.917 | TFLOPs: 31.74 | 7: iteration 96800/ 115203 | consumed samples: 24780800 | consumed tokens: 50751078400 | elapsed time per iteration (s): 0.38 | learning rate: 3.132E-05 | global batch size: 256 | lm loss: 2.963760E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 679.353 | TFLOPs: 31.71 | 7: iteration 96900/ 115203 | consumed samples: 24806400 | consumed tokens: 50803507200 | elapsed time per iteration (s): 0.38 | learning rate: 3.120E-05 | global batch size: 256 | lm loss: 2.962526E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 678.890 | TFLOPs: 31.69 | 7: iteration 97000/ 115203 | consumed samples: 24832000 | consumed tokens: 50855936000 | elapsed time per iteration (s): 0.38 | learning rate: 3.108E-05 | global batch size: 256 | lm loss: 2.961569E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 677.671 | TFLOPs: 31.63 | 7: iteration 97100/ 115203 | consumed samples: 24857600 | consumed tokens: 50908364800 | elapsed time per iteration (s): 0.57 | learning rate: 3.096E-05 | global batch size: 256 | lm loss: 2.962392E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 447.933 | TFLOPs: 20.91 | 7: iteration 97200/ 115203 | consumed samples: 24883200 | consumed tokens: 50960793600 | elapsed time per iteration (s): 0.37 | learning rate: 3.084E-05 | global batch size: 256 | lm loss: 2.962953E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 684.816 | TFLOPs: 31.96 | 7: iteration 97300/ 115203 | consumed samples: 24908800 | consumed tokens: 51013222400 | elapsed time per iteration (s): 0.38 | learning rate: 3.072E-05 | global batch size: 256 | lm loss: 2.961153E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 678.665 | TFLOPs: 31.68 | 7: iteration 97400/ 115203 | consumed samples: 24934400 | consumed tokens: 51065651200 | elapsed time per iteration (s): 0.38 | learning rate: 3.061E-05 | global batch size: 256 | lm loss: 2.962462E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 679.169 | TFLOPs: 31.70 | 7: iteration 97500/ 115203 | consumed samples: 24960000 | consumed tokens: 51118080000 | elapsed time per iteration (s): 0.38 | learning rate: 3.049E-05 | global batch size: 256 | lm loss: 2.961456E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 679.000 | TFLOPs: 31.69 | 7: iteration 97600/ 115203 | consumed samples: 24985600 | consumed tokens: 51170508800 | elapsed time per iteration (s): 0.38 | learning rate: 3.038E-05 | global batch size: 256 | lm loss: 2.958916E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 679.025 | TFLOPs: 31.69 | 7: iteration 97700/ 115203 | consumed samples: 25011200 | consumed tokens: 51222937600 | elapsed time per iteration (s): 0.38 | learning rate: 3.026E-05 | global batch size: 256 | lm loss: 2.960175E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 678.357 | TFLOPs: 31.66 | 7: iteration 97800/ 115203 | consumed samples: 25036800 | consumed tokens: 51275366400 | elapsed time per iteration (s): 0.38 | learning rate: 3.015E-05 | global batch size: 256 | lm loss: 2.960771E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 677.793 | TFLOPs: 31.64 | 7: iteration 97900/ 115203 | consumed samples: 25062400 | consumed tokens: 51327795200 | elapsed time per iteration (s): 0.38 | learning rate: 3.003E-05 | global batch size: 256 | lm loss: 2.964164E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 676.786 | TFLOPs: 31.59 | 0: [2023-03-17 05:13:49,201] [INFO] [logging.py:68:log_dist] [Rank 0] step=98000, skipped=0, lr=[2.9917836598254863e-05, 2.9917836598254863e-05, 2.9917836598254863e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 98000/ 115203 | consumed samples: 25088000 | consumed tokens: 51380224000 | elapsed time per iteration (s): 0.38 | learning rate: 2.992E-05 | global batch size: 256 | lm loss: 2.963749E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 680.668 | TFLOPs: 31.77 | 0: steps: 98000 loss: 2.9374 iter time (s): 0.385 samples/sec: 664.908 7: iteration 98100/ 115203 | consumed samples: 25113600 | consumed tokens: 51432652800 | elapsed time per iteration (s): 0.38 | learning rate: 2.981E-05 | global batch size: 256 | lm loss: 2.964787E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 676.047 | TFLOPs: 31.56 | 7: iteration 98200/ 115203 | consumed samples: 25139200 | consumed tokens: 51485081600 | elapsed time per iteration (s): 0.38 | learning rate: 2.969E-05 | global batch size: 256 | lm loss: 2.964667E+00 | grad norm: 0.272 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 679.198 | TFLOPs: 31.70 | 7: iteration 98300/ 115203 | consumed samples: 25164800 | consumed tokens: 51537510400 | elapsed time per iteration (s): 0.38 | learning rate: 2.958E-05 | global batch size: 256 | lm loss: 2.958403E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 678.297 | TFLOPs: 31.66 | 7: iteration 98400/ 115203 | consumed samples: 25190400 | consumed tokens: 51589939200 | elapsed time per iteration (s): 0.38 | learning rate: 2.947E-05 | global batch size: 256 | lm loss: 2.959511E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 675.700 | TFLOPs: 31.54 | 7: iteration 98500/ 115203 | consumed samples: 25216000 | consumed tokens: 51642368000 | elapsed time per iteration (s): 0.38 | learning rate: 2.936E-05 | global batch size: 256 | lm loss: 2.963420E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 677.020 | TFLOPs: 31.60 | 7: iteration 98600/ 115203 | consumed samples: 25241600 | consumed tokens: 51694796800 | elapsed time per iteration (s): 0.38 | learning rate: 2.925E-05 | global batch size: 256 | lm loss: 2.961186E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 679.846 | TFLOPs: 31.73 | 7: iteration 98700/ 115203 | consumed samples: 25267200 | consumed tokens: 51747225600 | elapsed time per iteration (s): 0.38 | learning rate: 2.914E-05 | global batch size: 256 | lm loss: 2.960999E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 679.974 | TFLOPs: 31.74 | 7: iteration 98800/ 115203 | consumed samples: 25292800 | consumed tokens: 51799654400 | elapsed time per iteration (s): 0.38 | learning rate: 2.903E-05 | global batch size: 256 | lm loss: 2.960456E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 679.319 | TFLOPs: 31.71 | 7: iteration 98900/ 115203 | consumed samples: 25318400 | consumed tokens: 51852083200 | elapsed time per iteration (s): 0.37 | learning rate: 2.892E-05 | global batch size: 256 | lm loss: 2.962391E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.669 | TFLOPs: 31.86 | 7: iteration 99000/ 115203 | consumed samples: 25344000 | consumed tokens: 51904512000 | elapsed time per iteration (s): 0.38 | learning rate: 2.882E-05 | global batch size: 256 | lm loss: 2.964752E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.786 | TFLOPs: 31.82 | 7: iteration 99100/ 115203 | consumed samples: 25369600 | consumed tokens: 51956940800 | elapsed time per iteration (s): 0.38 | learning rate: 2.871E-05 | global batch size: 256 | lm loss: 2.959993E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 679.015 | TFLOPs: 31.69 | 7: iteration 99200/ 115203 | consumed samples: 25395200 | consumed tokens: 52009369600 | elapsed time per iteration (s): 0.38 | learning rate: 2.860E-05 | global batch size: 256 | lm loss: 2.958188E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.392 | TFLOPs: 31.80 | 7: iteration 99300/ 115203 | consumed samples: 25420800 | consumed tokens: 52061798400 | elapsed time per iteration (s): 0.38 | learning rate: 2.850E-05 | global batch size: 256 | lm loss: 2.960181E+00 | grad norm: 0.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 680.943 | TFLOPs: 31.78 | 7: iteration 99400/ 115203 | consumed samples: 25446400 | consumed tokens: 52114227200 | elapsed time per iteration (s): 0.38 | learning rate: 2.839E-05 | global batch size: 256 | lm loss: 2.959634E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.634 | TFLOPs: 31.86 | 7: iteration 99500/ 115203 | consumed samples: 25472000 | consumed tokens: 52166656000 | elapsed time per iteration (s): 0.37 | learning rate: 2.829E-05 | global batch size: 256 | lm loss: 2.961420E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.844 | TFLOPs: 31.87 | 7: iteration 99600/ 115203 | consumed samples: 25497600 | consumed tokens: 52219084800 | elapsed time per iteration (s): 0.38 | learning rate: 2.819E-05 | global batch size: 256 | lm loss: 2.960156E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.237 | TFLOPs: 31.84 | 7: iteration 99700/ 115203 | consumed samples: 25523200 | consumed tokens: 52271513600 | elapsed time per iteration (s): 0.37 | learning rate: 2.808E-05 | global batch size: 256 | lm loss: 2.968931E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.771 | TFLOPs: 31.87 | 7: iteration 99800/ 115203 | consumed samples: 25548800 | consumed tokens: 52323942400 | elapsed time per iteration (s): 0.38 | learning rate: 2.798E-05 | global batch size: 256 | lm loss: 2.959062E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.517 | TFLOPs: 31.81 | 7: iteration 99900/ 115203 | consumed samples: 25574400 | consumed tokens: 52376371200 | elapsed time per iteration (s): 0.37 | learning rate: 2.788E-05 | global batch size: 256 | lm loss: 2.963839E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 683.054 | TFLOPs: 31.88 | 0: [2023-03-17 05:26:21,610] [INFO] [logging.py:68:log_dist] [Rank 0] step=100000, skipped=0, lr=[2.777783369036059e-05, 2.777783369036059e-05, 2.777783369036059e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 100000/ 115203 | consumed samples: 25600000 | consumed tokens: 52428800000 | elapsed time per iteration (s): 0.37 | learning rate: 2.778E-05 | global batch size: 256 | lm loss: 2.958836E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 683.515 | TFLOPs: 31.90 | 0: steps: 100000 loss: 2.9615 iter time (s): 0.374 samples/sec: 684.293 7: ------------------------------------------------------------------------------------------------- 7: validation loss at iteration 100000 | lm loss value: 3.785809E+00 | lm loss PPL: 4.407130E+01 | 7: ------------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 100000 to checkpoints_146m60b100m 0: [2023-03-17 05:26:21,740] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step100000 is begin to save! 0: [2023-03-17 05:26:21,745] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step100000/layer_01-model_00-model_states.pt... 0: [2023-03-17 05:26:21,843] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step100000/layer_01-model_00-model_states.pt. 0: [2023-03-17 05:26:21,843] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step100000/layer_03-model_00-model_states.pt... 0: [2023-03-17 05:26:21,859] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step100000/layer_03-model_00-model_states.pt. 0: [2023-03-17 05:26:21,859] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step100000/layer_04-model_00-model_states.pt... 0: [2023-03-17 05:26:21,874] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step100000/layer_04-model_00-model_states.pt. 0: [2023-03-17 05:26:21,874] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step100000/layer_05-model_00-model_states.pt... 0: [2023-03-17 05:26:21,889] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step100000/layer_05-model_00-model_states.pt. 0: [2023-03-17 05:26:21,889] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step100000/layer_06-model_00-model_states.pt... 0: [2023-03-17 05:26:21,904] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step100000/layer_06-model_00-model_states.pt. 0: [2023-03-17 05:26:21,904] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step100000/layer_07-model_00-model_states.pt... 0: [2023-03-17 05:26:21,919] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step100000/layer_07-model_00-model_states.pt. 0: [2023-03-17 05:26:21,919] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step100000/layer_08-model_00-model_states.pt... 0: [2023-03-17 05:26:21,934] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step100000/layer_08-model_00-model_states.pt. 0: [2023-03-17 05:26:21,934] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step100000/layer_09-model_00-model_states.pt... 0: [2023-03-17 05:26:21,949] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step100000/layer_09-model_00-model_states.pt. 0: [2023-03-17 05:26:21,949] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step100000/layer_10-model_00-model_states.pt... 0: [2023-03-17 05:26:21,964] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step100000/layer_10-model_00-model_states.pt. 0: [2023-03-17 05:26:21,964] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step100000/layer_11-model_00-model_states.pt... 0: [2023-03-17 05:26:21,978] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step100000/layer_11-model_00-model_states.pt. 0: [2023-03-17 05:26:21,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step100000/layer_12-model_00-model_states.pt... 0: [2023-03-17 05:26:21,993] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step100000/layer_12-model_00-model_states.pt. 0: [2023-03-17 05:26:21,994] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step100000/layer_13-model_00-model_states.pt... 0: [2023-03-17 05:26:22,008] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step100000/layer_13-model_00-model_states.pt. 0: [2023-03-17 05:26:22,009] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step100000/layer_14-model_00-model_states.pt... 0: [2023-03-17 05:26:22,023] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step100000/layer_14-model_00-model_states.pt. 0: [2023-03-17 05:26:22,024] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step100000/layer_15-model_00-model_states.pt... 0: [2023-03-17 05:26:22,038] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step100000/layer_15-model_00-model_states.pt. 0: [2023-03-17 05:26:22,039] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step100000/layer_16-model_00-model_states.pt... 0: [2023-03-17 05:26:22,053] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step100000/layer_16-model_00-model_states.pt. 0: [2023-03-17 05:26:22,053] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step100000/layer_17-model_00-model_states.pt... 0: [2023-03-17 05:26:22,068] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step100000/layer_17-model_00-model_states.pt. 0: [2023-03-17 05:26:22,068] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step100000/layer_19-model_00-model_states.pt... 0: [2023-03-17 05:26:22,069] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step100000/layer_19-model_00-model_states.pt. 0: [2023-03-17 05:26:22,070] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_146m60b100m/global_step100000/mp_rank_00_model_states.pt 0: [2023-03-17 05:26:22,070] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step100000/mp_rank_00_model_states.pt... 0: [2023-03-17 05:26:22,074] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step100000/mp_rank_00_model_states.pt. 0: [2023-03-17 05:26:22,091] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 05:26:22,092] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 05:26:22,092] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2023-03-17 05:26:22,092] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 05:26:22,092] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2023-03-17 05:26:22,092] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 4: [2023-03-17 05:26:22,091] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 05:26:22,091] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 05:26:22,091] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2023-03-17 05:26:22,091] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 05:26:22,091] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 7: [2023-03-17 05:26:22,091] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 05:26:22,092] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 05:26:22,092] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 05:26:22,092] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 05:26:22,092] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 0: [2023-03-17 05:26:22,092] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 2: [2023-03-17 05:26:22,092] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-17 05:26:22,092] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 05:26:22,092] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 05:26:22,092] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2023-03-17 05:26:22,092] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 6: [2023-03-17 05:26:22,091] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 05:26:22,091] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 05:26:22,091] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 05:26:22,091] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 05:26:22,091] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 3: [2023-03-17 05:26:22,091] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 05:26:22,091] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 05:26:22,091] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 05:26:22,091] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 05:26:22,091] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 4: [2023-03-17 05:26:22,091] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2023-03-17 05:26:22,091] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 05:26:22,091] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 5: [2023-03-17 05:26:22,092] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 05:26:22,092] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2023-03-17 05:26:22,092] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 05:26:22,092] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 05:26:22,092] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 7: [2023-03-17 05:26:22,092] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-17 05:26:22,092] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 1: [2023-03-17 05:26:22,091] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 05:26:22,091] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 05:26:22,091] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 05:26:22,091] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 05:26:22,091] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 0: [2023-03-17 05:26:22,092] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 2: [2023-03-17 05:26:22,092] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2023-03-17 05:26:22,092] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 05:26:22,092] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 6: [2023-03-17 05:26:22,091] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 05:26:22,091] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 3: [2023-03-17 05:26:22,091] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 05:26:22,092] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 05:26:22,091] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 5: [2023-03-17 05:26:22,092] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 05:26:22,092] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 05:26:22,092] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 7: [2023-03-17 05:26:22,092] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 1: [2023-03-17 05:26:22,091] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2023-03-17 05:26:22,091] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-03-17 05:26:22,091] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 6: [2023-03-17 05:26:22,091] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 0: [2023-03-17 05:26:22,126] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-03-17 05:26:22,126] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 0: [2023-03-17 05:26:22,126] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-03-17 05:26:22,126] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 0: [2023-03-17 05:26:22,126] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2023-03-17 05:26:22,126] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-03-17 05:26:22,126] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 0: [2023-03-17 05:26:22,135] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2023-03-17 05:26:22,135] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-03-17 05:26:22,135] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-17 05:26:22,135] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-17 05:26:22,135] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 0: [2023-03-17 05:26:22,135] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 0: [2023-03-17 05:26:22,136] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-17 05:26:22,136] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-03-17 05:26:22,136] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-03-17 05:26:22,136] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-03-17 05:26:22,136] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-03-17 05:26:22,136] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-17 05:26:22,136] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 0: [2023-03-17 05:26:22,136] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 0: [2023-03-17 05:26:22,136] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 4: [2023-03-17 05:26:22,151] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2023-03-17 05:26:22,151] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-03-17 05:26:22,151] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-03-17 05:26:22,151] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2023-03-17 05:26:22,151] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-03-17 05:26:22,151] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 4: [2023-03-17 05:26:22,151] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2023-03-17 05:26:22,151] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 05:26:22,151] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 05:26:22,151] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 4: [2023-03-17 05:26:22,151] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-03-17 05:26:22,151] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 4: [2023-03-17 05:26:22,151] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 4: [2023-03-17 05:26:22,151] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2023-03-17 05:26:22,151] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 05:26:22,151] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-03-17 05:26:22,151] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 4: [2023-03-17 05:26:22,151] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2023-03-17 05:26:22,151] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 4: [2023-03-17 05:26:22,151] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 05:26:22,151] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 4: [2023-03-17 05:26:22,153] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-03-17 05:26:22,153] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-03-17 05:26:22,153] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 0: [2023-03-17 05:26:22,157] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 05:26:22,158] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 5: [2023-03-17 05:26:22,162] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-03-17 05:26:22,162] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2023-03-17 05:26:22,162] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-03-17 05:26:22,162] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-03-17 05:26:22,162] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2023-03-17 05:26:22,162] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2023-03-17 05:26:22,162] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2023-03-17 05:26:22,162] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-03-17 05:26:22,162] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-03-17 05:26:22,162] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-17 05:26:22,162] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-03-17 05:26:22,162] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-03-17 05:26:22,162] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2023-03-17 05:26:22,162] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-03-17 05:26:22,162] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-03-17 05:26:22,162] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 5: [2023-03-17 05:26:22,162] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-03-17 05:26:22,162] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 5: [2023-03-17 05:26:22,162] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 5: [2023-03-17 05:26:22,162] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 5: [2023-03-17 05:26:22,162] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 5: [2023-03-17 05:26:22,162] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 5: [2023-03-17 05:26:22,162] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 5: [2023-03-17 05:26:22,162] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 2: [2023-03-17 05:26:22,163] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2023-03-17 05:26:22,163] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-17 05:26:22,163] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-03-17 05:26:22,163] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2023-03-17 05:26:22,163] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-03-17 05:26:22,163] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 05:26:22,163] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2023-03-17 05:26:22,163] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2023-03-17 05:26:22,163] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-03-17 05:26:22,163] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 05:26:22,163] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-03-17 05:26:22,163] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2023-03-17 05:26:22,163] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-03-17 05:26:22,163] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-03-17 05:26:22,163] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2023-03-17 05:26:22,163] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 05:26:22,163] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 2: [2023-03-17 05:26:22,163] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 2: [2023-03-17 05:26:22,163] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 2: [2023-03-17 05:26:22,163] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 2: [2023-03-17 05:26:22,163] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 2: [2023-03-17 05:26:22,163] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 2: [2023-03-17 05:26:22,163] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 2: [2023-03-17 05:26:22,163] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 3: [2023-03-17 05:26:22,163] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2023-03-17 05:26:22,163] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-17 05:26:22,163] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2023-03-17 05:26:22,163] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-03-17 05:26:22,163] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2023-03-17 05:26:22,163] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2023-03-17 05:26:22,163] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-03-17 05:26:22,163] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-03-17 05:26:22,163] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2023-03-17 05:26:22,163] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-17 05:26:22,163] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-03-17 05:26:22,163] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-17 05:26:22,163] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 3: [2023-03-17 05:26:22,163] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-03-17 05:26:22,163] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 3: [2023-03-17 05:26:22,163] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 05:26:22,163] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-17 05:26:22,163] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2023-03-17 05:26:22,163] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 3: [2023-03-17 05:26:22,163] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 3: [2023-03-17 05:26:22,163] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 3: [2023-03-17 05:26:22,164] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 3: [2023-03-17 05:26:22,164] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 3: [2023-03-17 05:26:22,164] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 7: [2023-03-17 05:26:22,165] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-17 05:26:22,165] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2023-03-17 05:26:22,165] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2023-03-17 05:26:22,165] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-03-17 05:26:22,165] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2023-03-17 05:26:22,165] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-03-17 05:26:22,165] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2023-03-17 05:26:22,165] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-03-17 05:26:22,165] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2023-03-17 05:26:22,165] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-03-17 05:26:22,165] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 05:26:22,165] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-03-17 05:26:22,165] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-03-17 05:26:22,165] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-03-17 05:26:22,165] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-03-17 05:26:22,165] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 05:26:22,165] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 7: [2023-03-17 05:26:22,165] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 7: [2023-03-17 05:26:22,165] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 7: [2023-03-17 05:26:22,165] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 7: [2023-03-17 05:26:22,165] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 7: [2023-03-17 05:26:22,165] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 7: [2023-03-17 05:26:22,165] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 7: [2023-03-17 05:26:22,165] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 6: [2023-03-17 05:26:22,167] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-03-17 05:26:22,167] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2023-03-17 05:26:22,167] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-03-17 05:26:22,167] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2023-03-17 05:26:22,167] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-03-17 05:26:22,167] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2023-03-17 05:26:22,167] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-03-17 05:26:22,167] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2023-03-17 05:26:22,167] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-03-17 05:26:22,167] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-17 05:26:22,167] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-17 05:26:22,167] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2023-03-17 05:26:22,167] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 05:26:22,167] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 05:26:22,167] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2023-03-17 05:26:22,167] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 6: [2023-03-17 05:26:22,167] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2023-03-17 05:26:22,167] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 6: [2023-03-17 05:26:22,167] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 6: [2023-03-17 05:26:22,167] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 6: [2023-03-17 05:26:22,167] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 6: [2023-03-17 05:26:22,167] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 6: [2023-03-17 05:26:22,167] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 6: [2023-03-17 05:26:22,167] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 1: [2023-03-17 05:26:22,172] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2023-03-17 05:26:22,172] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-03-17 05:26:22,172] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-03-17 05:26:22,172] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-17 05:26:22,172] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-03-17 05:26:22,172] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-03-17 05:26:22,172] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2023-03-17 05:26:22,172] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-03-17 05:26:22,172] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 05:26:22,172] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2023-03-17 05:26:22,172] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-03-17 05:26:22,172] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-03-17 05:26:22,172] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 05:26:22,172] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-17 05:26:22,172] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2023-03-17 05:26:22,172] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step100000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-03-17 05:26:22,172] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 1: [2023-03-17 05:26:22,172] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 1: [2023-03-17 05:26:22,172] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 1: [2023-03-17 05:26:22,172] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 1: [2023-03-17 05:26:22,172] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 1: [2023-03-17 05:26:22,172] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 1: [2023-03-17 05:26:22,172] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 1: [2023-03-17 05:26:22,172] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 0: successfully saved checkpoint at iteration 100000 to checkpoints_146m60b100m 7: time (ms) | save-checkpoint: 444.67 7: iteration 100100/ 115203 | consumed samples: 25625600 | consumed tokens: 52481228800 | elapsed time per iteration (s): 0.38 | learning rate: 2.768E-05 | global batch size: 256 | lm loss: 2.956562E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 673.054 | TFLOPs: 31.42 | 7: iteration 100200/ 115203 | consumed samples: 25651200 | consumed tokens: 52533657600 | elapsed time per iteration (s): 0.38 | learning rate: 2.758E-05 | global batch size: 256 | lm loss: 2.957444E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.968 | TFLOPs: 31.83 | 7: iteration 100300/ 115203 | consumed samples: 25676800 | consumed tokens: 52586086400 | elapsed time per iteration (s): 0.37 | learning rate: 2.748E-05 | global batch size: 256 | lm loss: 2.958381E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.807 | TFLOPs: 31.87 | 7: iteration 100400/ 115203 | consumed samples: 25702400 | consumed tokens: 52638515200 | elapsed time per iteration (s): 0.37 | learning rate: 2.738E-05 | global batch size: 256 | lm loss: 2.958749E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 683.909 | TFLOPs: 31.92 | 7: iteration 100500/ 115203 | consumed samples: 25728000 | consumed tokens: 52690944000 | elapsed time per iteration (s): 0.37 | learning rate: 2.728E-05 | global batch size: 256 | lm loss: 2.961566E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 683.437 | TFLOPs: 31.90 | 7: iteration 100600/ 115203 | consumed samples: 25753600 | consumed tokens: 52743372800 | elapsed time per iteration (s): 0.37 | learning rate: 2.718E-05 | global batch size: 256 | lm loss: 2.953586E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.680 | TFLOPs: 31.86 | 7: iteration 100700/ 115203 | consumed samples: 25779200 | consumed tokens: 52795801600 | elapsed time per iteration (s): 0.38 | learning rate: 2.709E-05 | global batch size: 256 | lm loss: 2.957162E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.562 | TFLOPs: 31.86 | 7: iteration 100800/ 115203 | consumed samples: 25804800 | consumed tokens: 52848230400 | elapsed time per iteration (s): 0.38 | learning rate: 2.699E-05 | global batch size: 256 | lm loss: 2.958308E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 679.958 | TFLOPs: 31.74 | 7: iteration 100900/ 115203 | consumed samples: 25830400 | consumed tokens: 52900659200 | elapsed time per iteration (s): 0.38 | learning rate: 2.690E-05 | global batch size: 256 | lm loss: 2.957041E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 679.585 | TFLOPs: 31.72 | 7: iteration 101000/ 115203 | consumed samples: 25856000 | consumed tokens: 52953088000 | elapsed time per iteration (s): 0.38 | learning rate: 2.680E-05 | global batch size: 256 | lm loss: 2.961727E+00 | grad norm: 0.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 680.395 | TFLOPs: 31.76 | 7: iteration 101100/ 115203 | consumed samples: 25881600 | consumed tokens: 53005516800 | elapsed time per iteration (s): 0.38 | learning rate: 2.671E-05 | global batch size: 256 | lm loss: 2.959506E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.021 | TFLOPs: 31.79 | 7: iteration 101200/ 115203 | consumed samples: 25907200 | consumed tokens: 53057945600 | elapsed time per iteration (s): 0.38 | learning rate: 2.661E-05 | global batch size: 256 | lm loss: 2.959541E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 677.798 | TFLOPs: 31.64 | 7: iteration 101300/ 115203 | consumed samples: 25932800 | consumed tokens: 53110374400 | elapsed time per iteration (s): 0.37 | learning rate: 2.652E-05 | global batch size: 256 | lm loss: 2.958183E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.804 | TFLOPs: 31.87 | 7: iteration 101400/ 115203 | consumed samples: 25958400 | consumed tokens: 53162803200 | elapsed time per iteration (s): 0.38 | learning rate: 2.643E-05 | global batch size: 256 | lm loss: 2.956536E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.225 | TFLOPs: 31.80 | 7: iteration 101500/ 115203 | consumed samples: 25984000 | consumed tokens: 53215232000 | elapsed time per iteration (s): 0.38 | learning rate: 2.634E-05 | global batch size: 256 | lm loss: 2.958935E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.606 | TFLOPs: 31.81 | 7: iteration 101600/ 115203 | consumed samples: 26009600 | consumed tokens: 53267660800 | elapsed time per iteration (s): 0.38 | learning rate: 2.625E-05 | global batch size: 256 | lm loss: 2.957458E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 680.187 | TFLOPs: 31.75 | 7: iteration 101700/ 115203 | consumed samples: 26035200 | consumed tokens: 53320089600 | elapsed time per iteration (s): 0.38 | learning rate: 2.615E-05 | global batch size: 256 | lm loss: 2.957745E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 679.869 | TFLOPs: 31.73 | 7: iteration 101800/ 115203 | consumed samples: 26060800 | consumed tokens: 53372518400 | elapsed time per iteration (s): 0.38 | learning rate: 2.606E-05 | global batch size: 256 | lm loss: 2.958286E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.267 | TFLOPs: 31.80 | 7: iteration 101900/ 115203 | consumed samples: 26086400 | consumed tokens: 53424947200 | elapsed time per iteration (s): 0.37 | learning rate: 2.598E-05 | global batch size: 256 | lm loss: 2.959774E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 684.264 | TFLOPs: 31.94 | 0: [2023-03-17 05:38:53,425] [INFO] [logging.py:68:log_dist] [Rank 0] step=102000, skipped=0, lr=[2.5887309996453706e-05, 2.5887309996453706e-05, 2.5887309996453706e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 102000/ 115203 | consumed samples: 26112000 | consumed tokens: 53477376000 | elapsed time per iteration (s): 0.38 | learning rate: 2.589E-05 | global batch size: 256 | lm loss: 2.960501E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 679.714 | TFLOPs: 31.73 | 0: steps: 102000 loss: 2.9506 iter time (s): 0.374 samples/sec: 684.079 7: iteration 102100/ 115203 | consumed samples: 26137600 | consumed tokens: 53529804800 | elapsed time per iteration (s): 0.38 | learning rate: 2.580E-05 | global batch size: 256 | lm loss: 2.956472E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 679.128 | TFLOPs: 31.70 | 7: iteration 102200/ 115203 | consumed samples: 26163200 | consumed tokens: 53582233600 | elapsed time per iteration (s): 0.38 | learning rate: 2.571E-05 | global batch size: 256 | lm loss: 2.957612E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 680.159 | TFLOPs: 31.75 | 7: iteration 102300/ 115203 | consumed samples: 26188800 | consumed tokens: 53634662400 | elapsed time per iteration (s): 0.38 | learning rate: 2.563E-05 | global batch size: 256 | lm loss: 2.959908E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 678.012 | TFLOPs: 31.65 | 7: iteration 102400/ 115203 | consumed samples: 26214400 | consumed tokens: 53687091200 | elapsed time per iteration (s): 0.38 | learning rate: 2.554E-05 | global batch size: 256 | lm loss: 2.959268E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.107 | TFLOPs: 31.79 | 7: iteration 102500/ 115203 | consumed samples: 26240000 | consumed tokens: 53739520000 | elapsed time per iteration (s): 0.38 | learning rate: 2.545E-05 | global batch size: 256 | lm loss: 2.961917E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.787 | TFLOPs: 31.82 | 7: iteration 102600/ 115203 | consumed samples: 26265600 | consumed tokens: 53791948800 | elapsed time per iteration (s): 0.38 | learning rate: 2.537E-05 | global batch size: 256 | lm loss: 2.959803E+00 | grad norm: 0.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 679.965 | TFLOPs: 31.74 | 7: iteration 102700/ 115203 | consumed samples: 26291200 | consumed tokens: 53844377600 | elapsed time per iteration (s): 0.38 | learning rate: 2.529E-05 | global batch size: 256 | lm loss: 2.959042E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 680.770 | TFLOPs: 31.78 | 7: iteration 102800/ 115203 | consumed samples: 26316800 | consumed tokens: 53896806400 | elapsed time per iteration (s): 0.38 | learning rate: 2.520E-05 | global batch size: 256 | lm loss: 2.953229E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 676.139 | TFLOPs: 31.56 | 7: iteration 102900/ 115203 | consumed samples: 26342400 | consumed tokens: 53949235200 | elapsed time per iteration (s): 0.38 | learning rate: 2.512E-05 | global batch size: 256 | lm loss: 2.961960E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 677.264 | TFLOPs: 31.61 | 7: iteration 103000/ 115203 | consumed samples: 26368000 | consumed tokens: 54001664000 | elapsed time per iteration (s): 0.38 | learning rate: 2.504E-05 | global batch size: 256 | lm loss: 2.955279E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 676.352 | TFLOPs: 31.57 | 7: iteration 103100/ 115203 | consumed samples: 26393600 | consumed tokens: 54054092800 | elapsed time per iteration (s): 0.37 | learning rate: 2.496E-05 | global batch size: 256 | lm loss: 2.958798E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.866 | TFLOPs: 31.87 | 7: iteration 103200/ 115203 | consumed samples: 26419200 | consumed tokens: 54106521600 | elapsed time per iteration (s): 0.38 | learning rate: 2.488E-05 | global batch size: 256 | lm loss: 2.955395E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.830 | TFLOPs: 31.83 | 7: iteration 103300/ 115203 | consumed samples: 26444800 | consumed tokens: 54158950400 | elapsed time per iteration (s): 0.38 | learning rate: 2.480E-05 | global batch size: 256 | lm loss: 2.955092E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 680.105 | TFLOPs: 31.74 | 7: iteration 103400/ 115203 | consumed samples: 26470400 | consumed tokens: 54211379200 | elapsed time per iteration (s): 0.38 | learning rate: 2.472E-05 | global batch size: 256 | lm loss: 2.958976E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.164 | TFLOPs: 31.84 | 7: iteration 103500/ 115203 | consumed samples: 26496000 | consumed tokens: 54263808000 | elapsed time per iteration (s): 0.38 | learning rate: 2.464E-05 | global batch size: 256 | lm loss: 2.954929E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.750 | TFLOPs: 31.82 | 7: iteration 103600/ 115203 | consumed samples: 26521600 | consumed tokens: 54316236800 | elapsed time per iteration (s): 0.38 | learning rate: 2.456E-05 | global batch size: 256 | lm loss: 2.952952E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.979 | TFLOPs: 31.83 | 7: iteration 103700/ 115203 | consumed samples: 26547200 | consumed tokens: 54368665600 | elapsed time per iteration (s): 0.38 | learning rate: 2.448E-05 | global batch size: 256 | lm loss: 2.954982E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.825 | TFLOPs: 31.83 | 7: iteration 103800/ 115203 | consumed samples: 26572800 | consumed tokens: 54421094400 | elapsed time per iteration (s): 0.38 | learning rate: 2.440E-05 | global batch size: 256 | lm loss: 2.955527E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 680.833 | TFLOPs: 31.78 | 7: iteration 103900/ 115203 | consumed samples: 26598400 | consumed tokens: 54473523200 | elapsed time per iteration (s): 0.38 | learning rate: 2.433E-05 | global batch size: 256 | lm loss: 2.956156E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 678.281 | TFLOPs: 31.66 | 0: [2023-03-17 05:51:26,140] [INFO] [logging.py:68:log_dist] [Rank 0] step=104000, skipped=0, lr=[2.4252001760011466e-05, 2.4252001760011466e-05, 2.4252001760011466e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 104000/ 115203 | consumed samples: 26624000 | consumed tokens: 54525952000 | elapsed time per iteration (s): 0.38 | learning rate: 2.425E-05 | global batch size: 256 | lm loss: 2.957812E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.342 | TFLOPs: 31.85 | 0: steps: 104000 loss: 2.9479 iter time (s): 0.375 samples/sec: 682.856 7: iteration 104100/ 115203 | consumed samples: 26649600 | consumed tokens: 54578380800 | elapsed time per iteration (s): 0.38 | learning rate: 2.418E-05 | global batch size: 256 | lm loss: 2.952973E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.566 | TFLOPs: 31.81 | 7: iteration 104200/ 115203 | consumed samples: 26675200 | consumed tokens: 54630809600 | elapsed time per iteration (s): 0.38 | learning rate: 2.410E-05 | global batch size: 256 | lm loss: 2.955842E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 679.899 | TFLOPs: 31.74 | 7: iteration 104300/ 115203 | consumed samples: 26700800 | consumed tokens: 54683238400 | elapsed time per iteration (s): 0.38 | learning rate: 2.403E-05 | global batch size: 256 | lm loss: 2.959027E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.114 | TFLOPs: 31.84 | 7: iteration 104400/ 115203 | consumed samples: 26726400 | consumed tokens: 54735667200 | elapsed time per iteration (s): 0.38 | learning rate: 2.396E-05 | global batch size: 256 | lm loss: 2.955871E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.156 | TFLOPs: 31.79 | 7: iteration 104500/ 115203 | consumed samples: 26752000 | consumed tokens: 54788096000 | elapsed time per iteration (s): 0.38 | learning rate: 2.388E-05 | global batch size: 256 | lm loss: 2.956245E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.370 | TFLOPs: 31.80 | 7: iteration 104600/ 115203 | consumed samples: 26777600 | consumed tokens: 54840524800 | elapsed time per iteration (s): 0.38 | learning rate: 2.381E-05 | global batch size: 256 | lm loss: 2.956365E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.164 | TFLOPs: 31.79 | 7: iteration 104700/ 115203 | consumed samples: 26803200 | consumed tokens: 54892953600 | elapsed time per iteration (s): 0.38 | learning rate: 2.374E-05 | global batch size: 256 | lm loss: 2.953498E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 677.518 | TFLOPs: 31.62 | 7: iteration 104800/ 115203 | consumed samples: 26828800 | consumed tokens: 54945382400 | elapsed time per iteration (s): 0.38 | learning rate: 2.367E-05 | global batch size: 256 | lm loss: 2.956720E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.269 | TFLOPs: 31.80 | 7: iteration 104900/ 115203 | consumed samples: 26854400 | consumed tokens: 54997811200 | elapsed time per iteration (s): 0.38 | learning rate: 2.360E-05 | global batch size: 256 | lm loss: 2.954114E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.331 | TFLOPs: 31.85 | 7: iteration 105000/ 115203 | consumed samples: 26880000 | consumed tokens: 55050240000 | elapsed time per iteration (s): 0.38 | learning rate: 2.353E-05 | global batch size: 256 | lm loss: 2.955081E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 680.378 | TFLOPs: 31.76 | 7: iteration 105100/ 115203 | consumed samples: 26905600 | consumed tokens: 55102668800 | elapsed time per iteration (s): 0.38 | learning rate: 2.346E-05 | global batch size: 256 | lm loss: 2.957284E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 677.150 | TFLOPs: 31.61 | 7: iteration 105200/ 115203 | consumed samples: 26931200 | consumed tokens: 55155097600 | elapsed time per iteration (s): 0.38 | learning rate: 2.340E-05 | global batch size: 256 | lm loss: 2.956768E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 679.835 | TFLOPs: 31.73 | 7: iteration 105300/ 115203 | consumed samples: 26956800 | consumed tokens: 55207526400 | elapsed time per iteration (s): 0.38 | learning rate: 2.333E-05 | global batch size: 256 | lm loss: 2.953636E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 679.934 | TFLOPs: 31.74 | 7: iteration 105400/ 115203 | consumed samples: 26982400 | consumed tokens: 55259955200 | elapsed time per iteration (s): 0.38 | learning rate: 2.326E-05 | global batch size: 256 | lm loss: 2.952372E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 680.812 | TFLOPs: 31.78 | 7: iteration 105500/ 115203 | consumed samples: 27008000 | consumed tokens: 55312384000 | elapsed time per iteration (s): 0.38 | learning rate: 2.320E-05 | global batch size: 256 | lm loss: 2.955403E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.608 | TFLOPs: 31.81 | 7: iteration 105600/ 115203 | consumed samples: 27033600 | consumed tokens: 55364812800 | elapsed time per iteration (s): 0.37 | learning rate: 2.313E-05 | global batch size: 256 | lm loss: 2.954180E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 684.283 | TFLOPs: 31.94 | 7: iteration 105700/ 115203 | consumed samples: 27059200 | consumed tokens: 55417241600 | elapsed time per iteration (s): 0.38 | learning rate: 2.307E-05 | global batch size: 256 | lm loss: 2.951900E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.439 | TFLOPs: 31.85 | 7: iteration 105800/ 115203 | consumed samples: 27084800 | consumed tokens: 55469670400 | elapsed time per iteration (s): 0.37 | learning rate: 2.300E-05 | global batch size: 256 | lm loss: 2.956959E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.777 | TFLOPs: 31.87 | 7: iteration 105900/ 115203 | consumed samples: 27110400 | consumed tokens: 55522099200 | elapsed time per iteration (s): 0.38 | learning rate: 2.294E-05 | global batch size: 256 | lm loss: 2.951958E+00 | grad norm: 0.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.951 | TFLOPs: 31.83 | 0: [2023-03-17 06:03:57,802] [INFO] [logging.py:68:log_dist] [Rank 0] step=106000, skipped=0, lr=[2.2876870847544666e-05, 2.2876870847544666e-05, 2.2876870847544666e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 106000/ 115203 | consumed samples: 27136000 | consumed tokens: 55574528000 | elapsed time per iteration (s): 0.37 | learning rate: 2.288E-05 | global batch size: 256 | lm loss: 2.959597E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 683.671 | TFLOPs: 31.91 | 0: steps: 106000 loss: 2.9861 iter time (s): 0.374 samples/sec: 683.775 7: iteration 106100/ 115203 | consumed samples: 27161600 | consumed tokens: 55626956800 | elapsed time per iteration (s): 0.38 | learning rate: 2.282E-05 | global batch size: 256 | lm loss: 2.954508E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 680.588 | TFLOPs: 31.77 | 7: iteration 106200/ 115203 | consumed samples: 27187200 | consumed tokens: 55679385600 | elapsed time per iteration (s): 0.37 | learning rate: 2.275E-05 | global batch size: 256 | lm loss: 2.954349E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 684.395 | TFLOPs: 31.95 | 7: iteration 106300/ 115203 | consumed samples: 27212800 | consumed tokens: 55731814400 | elapsed time per iteration (s): 0.37 | learning rate: 2.269E-05 | global batch size: 256 | lm loss: 2.954509E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 684.075 | TFLOPs: 31.93 | 7: iteration 106400/ 115203 | consumed samples: 27238400 | consumed tokens: 55784243200 | elapsed time per iteration (s): 0.37 | learning rate: 2.263E-05 | global batch size: 256 | lm loss: 2.953808E+00 | grad norm: 0.272 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 683.693 | TFLOPs: 31.91 | 7: iteration 106500/ 115203 | consumed samples: 27264000 | consumed tokens: 55836672000 | elapsed time per iteration (s): 0.37 | learning rate: 2.257E-05 | global batch size: 256 | lm loss: 2.954253E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 683.940 | TFLOPs: 31.92 | 7: iteration 106600/ 115203 | consumed samples: 27289600 | consumed tokens: 55889100800 | elapsed time per iteration (s): 0.37 | learning rate: 2.252E-05 | global batch size: 256 | lm loss: 2.954317E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.816 | TFLOPs: 31.87 | 7: iteration 106700/ 115203 | consumed samples: 27315200 | consumed tokens: 55941529600 | elapsed time per iteration (s): 0.38 | learning rate: 2.246E-05 | global batch size: 256 | lm loss: 2.953192E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 679.708 | TFLOPs: 31.73 | 7: iteration 106800/ 115203 | consumed samples: 27340800 | consumed tokens: 55993958400 | elapsed time per iteration (s): 0.38 | learning rate: 2.240E-05 | global batch size: 256 | lm loss: 2.958477E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 679.656 | TFLOPs: 31.72 | 7: iteration 106900/ 115203 | consumed samples: 27366400 | consumed tokens: 56046387200 | elapsed time per iteration (s): 0.38 | learning rate: 2.234E-05 | global batch size: 256 | lm loss: 2.948642E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 677.711 | TFLOPs: 31.63 | 7: iteration 107000/ 115203 | consumed samples: 27392000 | consumed tokens: 56098816000 | elapsed time per iteration (s): 0.38 | learning rate: 2.229E-05 | global batch size: 256 | lm loss: 2.955528E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 680.487 | TFLOPs: 31.76 | 7: iteration 107100/ 115203 | consumed samples: 27417600 | consumed tokens: 56151244800 | elapsed time per iteration (s): 0.38 | learning rate: 2.223E-05 | global batch size: 256 | lm loss: 2.952940E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 679.527 | TFLOPs: 31.72 | 7: iteration 107200/ 115203 | consumed samples: 27443200 | consumed tokens: 56203673600 | elapsed time per iteration (s): 0.38 | learning rate: 2.218E-05 | global batch size: 256 | lm loss: 2.955843E+00 | grad norm: 0.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 677.064 | TFLOPs: 31.60 | 7: iteration 107300/ 115203 | consumed samples: 27468800 | consumed tokens: 56256102400 | elapsed time per iteration (s): 0.38 | learning rate: 2.212E-05 | global batch size: 256 | lm loss: 2.953818E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 678.543 | TFLOPs: 31.67 | 7: iteration 107400/ 115203 | consumed samples: 27494400 | consumed tokens: 56308531200 | elapsed time per iteration (s): 0.38 | learning rate: 2.207E-05 | global batch size: 256 | lm loss: 2.951953E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 680.797 | TFLOPs: 31.78 | 7: iteration 107500/ 115203 | consumed samples: 27520000 | consumed tokens: 56360960000 | elapsed time per iteration (s): 0.38 | learning rate: 2.202E-05 | global batch size: 256 | lm loss: 2.954464E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.368 | TFLOPs: 31.85 | 7: iteration 107600/ 115203 | consumed samples: 27545600 | consumed tokens: 56413388800 | elapsed time per iteration (s): 0.38 | learning rate: 2.197E-05 | global batch size: 256 | lm loss: 2.955056E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 679.097 | TFLOPs: 31.70 | 7: iteration 107700/ 115203 | consumed samples: 27571200 | consumed tokens: 56465817600 | elapsed time per iteration (s): 0.37 | learning rate: 2.192E-05 | global batch size: 256 | lm loss: 2.956660E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.992 | TFLOPs: 31.88 | 7: iteration 107800/ 115203 | consumed samples: 27596800 | consumed tokens: 56518246400 | elapsed time per iteration (s): 0.37 | learning rate: 2.187E-05 | global batch size: 256 | lm loss: 2.954799E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 683.020 | TFLOPs: 31.88 | 7: iteration 107900/ 115203 | consumed samples: 27622400 | consumed tokens: 56570675200 | elapsed time per iteration (s): 0.38 | learning rate: 2.182E-05 | global batch size: 256 | lm loss: 2.955754E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.275 | TFLOPs: 31.80 | 0: [2023-03-17 06:16:29,355] [INFO] [logging.py:68:log_dist] [Rank 0] step=108000, skipped=0, lr=[2.176608969325893e-05, 2.176608969325893e-05, 2.176608969325893e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 108000/ 115203 | consumed samples: 27648000 | consumed tokens: 56623104000 | elapsed time per iteration (s): 0.37 | learning rate: 2.177E-05 | global batch size: 256 | lm loss: 2.955168E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 683.531 | TFLOPs: 31.90 | 0: steps: 108000 loss: 2.9697 iter time (s): 0.374 samples/sec: 683.969 7: iteration 108100/ 115203 | consumed samples: 27673600 | consumed tokens: 56675532800 | elapsed time per iteration (s): 0.38 | learning rate: 2.172E-05 | global batch size: 256 | lm loss: 2.956577E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.528 | TFLOPs: 31.86 | 7: iteration 108200/ 115203 | consumed samples: 27699200 | consumed tokens: 56727961600 | elapsed time per iteration (s): 0.38 | learning rate: 2.167E-05 | global batch size: 256 | lm loss: 2.956351E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.405 | TFLOPs: 31.85 | 7: iteration 108300/ 115203 | consumed samples: 27724800 | consumed tokens: 56780390400 | elapsed time per iteration (s): 0.37 | learning rate: 2.162E-05 | global batch size: 256 | lm loss: 2.953129E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 683.426 | TFLOPs: 31.90 | 7: iteration 108400/ 115203 | consumed samples: 27750400 | consumed tokens: 56832819200 | elapsed time per iteration (s): 0.37 | learning rate: 2.158E-05 | global batch size: 256 | lm loss: 2.952289E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 683.717 | TFLOPs: 31.91 | 7: iteration 108500/ 115203 | consumed samples: 27776000 | consumed tokens: 56885248000 | elapsed time per iteration (s): 0.37 | learning rate: 2.153E-05 | global batch size: 256 | lm loss: 2.953652E+00 | grad norm: 0.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 683.653 | TFLOPs: 31.91 | 7: iteration 108600/ 115203 | consumed samples: 27801600 | consumed tokens: 56937676800 | elapsed time per iteration (s): 0.37 | learning rate: 2.148E-05 | global batch size: 256 | lm loss: 2.951844E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 683.020 | TFLOPs: 31.88 | 7: iteration 108700/ 115203 | consumed samples: 27827200 | consumed tokens: 56990105600 | elapsed time per iteration (s): 0.37 | learning rate: 2.144E-05 | global batch size: 256 | lm loss: 2.951935E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 683.556 | TFLOPs: 31.91 | 7: iteration 108800/ 115203 | consumed samples: 27852800 | consumed tokens: 57042534400 | elapsed time per iteration (s): 0.38 | learning rate: 2.140E-05 | global batch size: 256 | lm loss: 2.953805E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 679.394 | TFLOPs: 31.71 | 7: iteration 108900/ 115203 | consumed samples: 27878400 | consumed tokens: 57094963200 | elapsed time per iteration (s): 0.37 | learning rate: 2.135E-05 | global batch size: 256 | lm loss: 2.957312E+00 | grad norm: 0.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.858 | TFLOPs: 31.87 | 7: iteration 109000/ 115203 | consumed samples: 27904000 | consumed tokens: 57147392000 | elapsed time per iteration (s): 0.38 | learning rate: 2.131E-05 | global batch size: 256 | lm loss: 2.953484E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.152 | TFLOPs: 31.84 | 7: iteration 109100/ 115203 | consumed samples: 27929600 | consumed tokens: 57199820800 | elapsed time per iteration (s): 0.37 | learning rate: 2.127E-05 | global batch size: 256 | lm loss: 2.954173E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.947 | TFLOPs: 31.88 | 7: iteration 109200/ 115203 | consumed samples: 27955200 | consumed tokens: 57252249600 | elapsed time per iteration (s): 0.37 | learning rate: 2.123E-05 | global batch size: 256 | lm loss: 2.952913E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 683.384 | TFLOPs: 31.90 | 7: iteration 109300/ 115203 | consumed samples: 27980800 | consumed tokens: 57304678400 | elapsed time per iteration (s): 0.37 | learning rate: 2.119E-05 | global batch size: 256 | lm loss: 2.957710E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 683.507 | TFLOPs: 31.90 | 7: iteration 109400/ 115203 | consumed samples: 28006400 | consumed tokens: 57357107200 | elapsed time per iteration (s): 0.37 | learning rate: 2.115E-05 | global batch size: 256 | lm loss: 2.952892E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 683.000 | TFLOPs: 31.88 | 7: iteration 109500/ 115203 | consumed samples: 28032000 | consumed tokens: 57409536000 | elapsed time per iteration (s): 0.37 | learning rate: 2.111E-05 | global batch size: 256 | lm loss: 2.950456E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 683.435 | TFLOPs: 31.90 | 7: iteration 109600/ 115203 | consumed samples: 28057600 | consumed tokens: 57461964800 | elapsed time per iteration (s): 0.38 | learning rate: 2.107E-05 | global batch size: 256 | lm loss: 2.952662E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.829 | TFLOPs: 31.83 | 7: iteration 109700/ 115203 | consumed samples: 28083200 | consumed tokens: 57514393600 | elapsed time per iteration (s): 0.38 | learning rate: 2.103E-05 | global batch size: 256 | lm loss: 2.956406E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 678.185 | TFLOPs: 31.66 | 7: iteration 109800/ 115203 | consumed samples: 28108800 | consumed tokens: 57566822400 | elapsed time per iteration (s): 0.38 | learning rate: 2.100E-05 | global batch size: 256 | lm loss: 2.953973E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 677.815 | TFLOPs: 31.64 | 7: iteration 109900/ 115203 | consumed samples: 28134400 | consumed tokens: 57619251200 | elapsed time per iteration (s): 0.38 | learning rate: 2.096E-05 | global batch size: 256 | lm loss: 2.954174E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 677.974 | TFLOPs: 31.65 | 0: [2023-03-17 06:29:00,013] [INFO] [logging.py:68:log_dist] [Rank 0] step=110000, skipped=0, lr=[2.092302863901853e-05, 2.092302863901853e-05, 2.092302863901853e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 110000/ 115203 | consumed samples: 28160000 | consumed tokens: 57671680000 | elapsed time per iteration (s): 0.37 | learning rate: 2.092E-05 | global batch size: 256 | lm loss: 2.953740E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.681 | TFLOPs: 31.87 | 0: steps: 110000 loss: 2.9540 iter time (s): 0.374 samples/sec: 684.777 7: ------------------------------------------------------------------------------------------------- 7: validation loss at iteration 110000 | lm loss value: 3.907446E+00 | lm loss PPL: 4.977169E+01 | 7: ------------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 110000 to checkpoints_146m60b100m 0: [2023-03-17 06:29:00,139] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step110000 is begin to save! 0: [2023-03-17 06:29:00,144] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step110000/layer_01-model_00-model_states.pt... 0: [2023-03-17 06:29:00,238] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step110000/layer_01-model_00-model_states.pt. 0: [2023-03-17 06:29:00,238] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step110000/layer_03-model_00-model_states.pt... 0: [2023-03-17 06:29:00,254] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step110000/layer_03-model_00-model_states.pt. 0: [2023-03-17 06:29:00,254] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step110000/layer_04-model_00-model_states.pt... 0: [2023-03-17 06:29:00,269] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step110000/layer_04-model_00-model_states.pt. 0: [2023-03-17 06:29:00,269] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step110000/layer_05-model_00-model_states.pt... 0: [2023-03-17 06:29:00,284] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step110000/layer_05-model_00-model_states.pt. 0: [2023-03-17 06:29:00,284] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step110000/layer_06-model_00-model_states.pt... 0: [2023-03-17 06:29:00,299] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step110000/layer_06-model_00-model_states.pt. 0: [2023-03-17 06:29:00,299] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step110000/layer_07-model_00-model_states.pt... 0: [2023-03-17 06:29:00,314] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step110000/layer_07-model_00-model_states.pt. 0: [2023-03-17 06:29:00,314] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step110000/layer_08-model_00-model_states.pt... 0: [2023-03-17 06:29:00,329] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step110000/layer_08-model_00-model_states.pt. 0: [2023-03-17 06:29:00,329] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step110000/layer_09-model_00-model_states.pt... 0: [2023-03-17 06:29:00,344] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step110000/layer_09-model_00-model_states.pt. 0: [2023-03-17 06:29:00,344] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step110000/layer_10-model_00-model_states.pt... 0: [2023-03-17 06:29:00,358] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step110000/layer_10-model_00-model_states.pt. 0: [2023-03-17 06:29:00,359] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step110000/layer_11-model_00-model_states.pt... 0: [2023-03-17 06:29:00,373] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step110000/layer_11-model_00-model_states.pt. 0: [2023-03-17 06:29:00,374] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step110000/layer_12-model_00-model_states.pt... 0: [2023-03-17 06:29:00,388] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step110000/layer_12-model_00-model_states.pt. 0: [2023-03-17 06:29:00,388] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step110000/layer_13-model_00-model_states.pt... 0: [2023-03-17 06:29:00,403] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step110000/layer_13-model_00-model_states.pt. 0: [2023-03-17 06:29:00,403] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step110000/layer_14-model_00-model_states.pt... 0: [2023-03-17 06:29:00,418] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step110000/layer_14-model_00-model_states.pt. 0: [2023-03-17 06:29:00,418] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step110000/layer_15-model_00-model_states.pt... 0: [2023-03-17 06:29:00,433] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step110000/layer_15-model_00-model_states.pt. 0: [2023-03-17 06:29:00,433] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step110000/layer_16-model_00-model_states.pt... 0: [2023-03-17 06:29:00,448] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step110000/layer_16-model_00-model_states.pt. 0: [2023-03-17 06:29:00,448] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step110000/layer_17-model_00-model_states.pt... 0: [2023-03-17 06:29:00,463] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step110000/layer_17-model_00-model_states.pt. 0: [2023-03-17 06:29:00,463] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step110000/layer_19-model_00-model_states.pt... 0: [2023-03-17 06:29:00,464] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step110000/layer_19-model_00-model_states.pt. 0: [2023-03-17 06:29:00,465] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_146m60b100m/global_step110000/mp_rank_00_model_states.pt 0: [2023-03-17 06:29:00,465] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step110000/mp_rank_00_model_states.pt... 0: [2023-03-17 06:29:00,467] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step110000/mp_rank_00_model_states.pt. 0: [2023-03-17 06:29:00,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 06:29:00,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 06:29:00,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 06:29:00,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2023-03-17 06:29:00,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 2: [2023-03-17 06:29:00,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 06:29:00,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 06:29:00,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 06:29:00,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 06:29:00,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 6: [2023-03-17 06:29:00,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 06:29:00,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 06:29:00,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-17 06:29:00,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 06:29:00,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 5: [2023-03-17 06:29:00,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-03-17 06:29:00,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2023-03-17 06:29:00,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 06:29:00,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 06:29:00,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 7: [2023-03-17 06:29:00,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 06:29:00,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-17 06:29:00,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 06:29:00,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 06:29:00,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 1: [2023-03-17 06:29:00,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 06:29:00,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 06:29:00,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 06:29:00,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 06:29:00,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 0: [2023-03-17 06:29:00,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 06:29:00,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 2: [2023-03-17 06:29:00,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2023-03-17 06:29:00,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 06:29:00,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 6: [2023-03-17 06:29:00,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 06:29:00,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 06:29:00,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 3: [2023-03-17 06:29:00,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 06:29:00,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 06:29:00,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 06:29:00,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2023-03-17 06:29:00,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 4: [2023-03-17 06:29:00,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 06:29:00,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 06:29:00,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 06:29:00,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 06:29:00,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 5: [2023-03-17 06:29:00,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 06:29:00,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 06:29:00,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 7: [2023-03-17 06:29:00,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 7: [2023-03-17 06:29:00,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 06:29:00,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 1: [2023-03-17 06:29:00,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-03-17 06:29:00,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2023-03-17 06:29:00,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 0: [2023-03-17 06:29:00,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 3: [2023-03-17 06:29:00,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 06:29:00,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 06:29:00,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 4: [2023-03-17 06:29:00,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 06:29:00,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 06:29:00,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 0: [2023-03-17 06:29:00,517] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 5: [2023-03-17 06:29:00,518] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2023-03-17 06:29:00,518] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2023-03-17 06:29:00,518] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 0: [2023-03-17 06:29:00,519] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2023-03-17 06:29:00,519] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-03-17 06:29:00,519] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 5: [2023-03-17 06:29:00,520] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-03-17 06:29:00,520] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-03-17 06:29:00,520] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 5: [2023-03-17 06:29:00,521] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-03-17 06:29:00,521] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-03-17 06:29:00,521] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 0: [2023-03-17 06:29:00,522] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-17 06:29:00,522] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-03-17 06:29:00,522] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 0: [2023-03-17 06:29:00,522] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-03-17 06:29:00,522] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-17 06:29:00,522] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 4: [2023-03-17 06:29:00,522] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2023-03-17 06:29:00,522] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2023-03-17 06:29:00,522] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2023-03-17 06:29:00,522] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-03-17 06:29:00,522] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2023-03-17 06:29:00,522] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-03-17 06:29:00,522] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2023-03-17 06:29:00,522] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-03-17 06:29:00,522] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 06:29:00,522] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 4: [2023-03-17 06:29:00,522] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 4: [2023-03-17 06:29:00,522] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 4: [2023-03-17 06:29:00,522] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 06:29:00,522] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 4: [2023-03-17 06:29:00,522] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 0: [2023-03-17 06:29:00,523] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 5: [2023-03-17 06:29:00,523] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 0: [2023-03-17 06:29:00,523] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 5: [2023-03-17 06:29:00,523] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 0: [2023-03-17 06:29:00,523] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 5: [2023-03-17 06:29:00,523] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 5: [2023-03-17 06:29:00,523] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-03-17 06:29:00,523] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-03-17 06:29:00,523] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 0: [2023-03-17 06:29:00,523] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-03-17 06:29:00,524] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-03-17 06:29:00,524] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 6: [2023-03-17 06:29:00,524] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-03-17 06:29:00,524] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2023-03-17 06:29:00,524] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-03-17 06:29:00,524] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-17 06:29:00,524] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2023-03-17 06:29:00,524] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 06:29:00,524] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 6: [2023-03-17 06:29:00,524] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 6: [2023-03-17 06:29:00,524] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 5: [2023-03-17 06:29:00,524] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2023-03-17 06:29:00,524] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-03-17 06:29:00,525] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 5: [2023-03-17 06:29:00,525] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2023-03-17 06:29:00,525] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-03-17 06:29:00,526] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 7: [2023-03-17 06:29:00,525] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2023-03-17 06:29:00,526] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2023-03-17 06:29:00,525] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-03-17 06:29:00,526] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 7: [2023-03-17 06:29:00,526] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-03-17 06:29:00,526] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 7: [2023-03-17 06:29:00,526] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2023-03-17 06:29:00,526] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-03-17 06:29:00,526] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 7: [2023-03-17 06:29:00,526] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 0: [2023-03-17 06:29:00,526] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-03-17 06:29:00,526] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-17 06:29:00,526] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 7: [2023-03-17 06:29:00,526] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-03-17 06:29:00,526] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 5: [2023-03-17 06:29:00,526] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2023-03-17 06:29:00,526] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-17 06:29:00,526] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 4: [2023-03-17 06:29:00,527] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-03-17 06:29:00,527] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 06:29:00,527] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 0: [2023-03-17 06:29:00,528] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-03-17 06:29:00,528] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-03-17 06:29:00,528] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 3: [2023-03-17 06:29:00,530] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2023-03-17 06:29:00,530] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-03-17 06:29:00,530] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2023-03-17 06:29:00,530] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-03-17 06:29:00,530] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-17 06:29:00,530] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-03-17 06:29:00,530] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2023-03-17 06:29:00,530] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2023-03-17 06:29:00,530] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 06:29:00,530] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-03-17 06:29:00,530] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-17 06:29:00,530] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2023-03-17 06:29:00,530] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-17 06:29:00,530] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-17 06:29:00,530] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2023-03-17 06:29:00,530] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-03-17 06:29:00,530] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 3: [2023-03-17 06:29:00,530] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 3: [2023-03-17 06:29:00,530] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 3: [2023-03-17 06:29:00,530] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 3: [2023-03-17 06:29:00,530] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 3: [2023-03-17 06:29:00,530] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 3: [2023-03-17 06:29:00,530] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 3: [2023-03-17 06:29:00,530] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 6: [2023-03-17 06:29:00,533] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-03-17 06:29:00,533] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-03-17 06:29:00,533] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 7: [2023-03-17 06:29:00,533] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-17 06:29:00,533] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-03-17 06:29:00,533] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-03-17 06:29:00,533] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 06:29:00,533] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-03-17 06:29:00,533] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 06:29:00,534] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 7: [2023-03-17 06:29:00,534] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 7: [2023-03-17 06:29:00,534] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 7: [2023-03-17 06:29:00,534] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2023-03-17 06:29:00,534] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-03-17 06:29:00,534] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 6: [2023-03-17 06:29:00,537] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2023-03-17 06:29:00,537] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2023-03-17 06:29:00,537] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-03-17 06:29:00,537] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2023-03-17 06:29:00,537] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 6: [2023-03-17 06:29:00,537] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 06:29:00,537] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2023-03-17 06:29:00,538] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 6: [2023-03-17 06:29:00,538] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 6: [2023-03-17 06:29:00,539] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2023-03-17 06:29:00,539] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-17 06:29:00,539] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 4: [2023-03-17 06:29:00,542] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-03-17 06:29:00,542] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-03-17 06:29:00,542] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-03-17 06:29:00,542] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 06:29:00,542] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 4: [2023-03-17 06:29:00,542] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 1: [2023-03-17 06:29:00,549] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2023-03-17 06:29:00,549] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-03-17 06:29:00,549] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-03-17 06:29:00,549] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-03-17 06:29:00,549] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-17 06:29:00,549] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-03-17 06:29:00,549] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-03-17 06:29:00,549] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2023-03-17 06:29:00,549] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-03-17 06:29:00,549] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2023-03-17 06:29:00,549] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 06:29:00,549] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-03-17 06:29:00,549] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 06:29:00,549] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 1: [2023-03-17 06:29:00,549] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-17 06:29:00,549] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-03-17 06:29:00,549] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2023-03-17 06:29:00,549] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 1: [2023-03-17 06:29:00,549] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 1: [2023-03-17 06:29:00,549] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 1: [2023-03-17 06:29:00,549] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 1: [2023-03-17 06:29:00,549] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 1: [2023-03-17 06:29:00,549] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 1: [2023-03-17 06:29:00,549] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 0: [2023-03-17 06:29:00,551] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 06:29:00,551] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 2: [2023-03-17 06:29:00,558] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-03-17 06:29:00,558] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2023-03-17 06:29:00,558] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 06:29:00,558] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2023-03-17 06:29:00,558] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2023-03-17 06:29:00,558] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2023-03-17 06:29:00,558] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-03-17 06:29:00,558] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-03-17 06:29:00,558] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-17 06:29:00,558] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2023-03-17 06:29:00,558] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 2: [2023-03-17 06:29:00,558] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 2: [2023-03-17 06:29:00,558] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 06:29:00,558] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2023-03-17 06:29:00,558] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-03-17 06:29:00,558] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-03-17 06:29:00,558] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-03-17 06:29:00,558] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step110000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 06:29:00,558] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 2: [2023-03-17 06:29:00,558] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 2: [2023-03-17 06:29:00,558] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 2: [2023-03-17 06:29:00,558] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 2: [2023-03-17 06:29:00,558] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 2: [2023-03-17 06:29:00,558] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 0: successfully saved checkpoint at iteration 110000 to checkpoints_146m60b100m 7: time (ms) | save-checkpoint: 425.63 7: iteration 110100/ 115203 | consumed samples: 28185600 | consumed tokens: 57724108800 | elapsed time per iteration (s): 0.38 | learning rate: 2.089E-05 | global batch size: 256 | lm loss: 2.953890E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 673.749 | TFLOPs: 31.45 | 7: iteration 110200/ 115203 | consumed samples: 28211200 | consumed tokens: 57776537600 | elapsed time per iteration (s): 0.37 | learning rate: 2.085E-05 | global batch size: 256 | lm loss: 2.952292E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 683.267 | TFLOPs: 31.89 | 7: iteration 110300/ 115203 | consumed samples: 28236800 | consumed tokens: 57828966400 | elapsed time per iteration (s): 0.37 | learning rate: 2.082E-05 | global batch size: 256 | lm loss: 2.954772E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 683.497 | TFLOPs: 31.90 | 7: iteration 110400/ 115203 | consumed samples: 28262400 | consumed tokens: 57881395200 | elapsed time per iteration (s): 0.38 | learning rate: 2.079E-05 | global batch size: 256 | lm loss: 2.953161E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 677.645 | TFLOPs: 31.63 | 7: iteration 110500/ 115203 | consumed samples: 28288000 | consumed tokens: 57933824000 | elapsed time per iteration (s): 0.38 | learning rate: 2.075E-05 | global batch size: 256 | lm loss: 2.954185E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 678.841 | TFLOPs: 31.69 | 7: iteration 110600/ 115203 | consumed samples: 28313600 | consumed tokens: 57986252800 | elapsed time per iteration (s): 0.37 | learning rate: 2.072E-05 | global batch size: 256 | lm loss: 2.954003E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 683.451 | TFLOPs: 31.90 | 7: iteration 110700/ 115203 | consumed samples: 28339200 | consumed tokens: 58038681600 | elapsed time per iteration (s): 0.37 | learning rate: 2.069E-05 | global batch size: 256 | lm loss: 2.951849E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 683.248 | TFLOPs: 31.89 | 7: iteration 110800/ 115203 | consumed samples: 28364800 | consumed tokens: 58091110400 | elapsed time per iteration (s): 0.37 | learning rate: 2.066E-05 | global batch size: 256 | lm loss: 2.952955E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 683.595 | TFLOPs: 31.91 | 7: iteration 110900/ 115203 | consumed samples: 28390400 | consumed tokens: 58143539200 | elapsed time per iteration (s): 0.37 | learning rate: 2.063E-05 | global batch size: 256 | lm loss: 2.953469E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 683.764 | TFLOPs: 31.92 | 7: iteration 111000/ 115203 | consumed samples: 28416000 | consumed tokens: 58195968000 | elapsed time per iteration (s): 0.38 | learning rate: 2.060E-05 | global batch size: 256 | lm loss: 2.944440E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.442 | TFLOPs: 31.85 | 7: iteration 111100/ 115203 | consumed samples: 28441600 | consumed tokens: 58248396800 | elapsed time per iteration (s): 0.38 | learning rate: 2.057E-05 | global batch size: 256 | lm loss: 2.957772E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 678.336 | TFLOPs: 31.66 | 7: iteration 111200/ 115203 | consumed samples: 28467200 | consumed tokens: 58300825600 | elapsed time per iteration (s): 0.38 | learning rate: 2.055E-05 | global batch size: 256 | lm loss: 2.952030E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 676.322 | TFLOPs: 31.57 | 7: iteration 111300/ 115203 | consumed samples: 28492800 | consumed tokens: 58353254400 | elapsed time per iteration (s): 0.38 | learning rate: 2.052E-05 | global batch size: 256 | lm loss: 2.949708E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 676.507 | TFLOPs: 31.58 | 7: iteration 111400/ 115203 | consumed samples: 28518400 | consumed tokens: 58405683200 | elapsed time per iteration (s): 0.38 | learning rate: 2.049E-05 | global batch size: 256 | lm loss: 2.951187E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 681.713 | TFLOPs: 31.82 | 7: iteration 111500/ 115203 | consumed samples: 28544000 | consumed tokens: 58458112000 | elapsed time per iteration (s): 0.38 | learning rate: 2.047E-05 | global batch size: 256 | lm loss: 2.954704E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 678.339 | TFLOPs: 31.66 | 7: iteration 111600/ 115203 | consumed samples: 28569600 | consumed tokens: 58510540800 | elapsed time per iteration (s): 0.38 | learning rate: 2.044E-05 | global batch size: 256 | lm loss: 2.949869E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 678.202 | TFLOPs: 31.66 | 7: iteration 111700/ 115203 | consumed samples: 28595200 | consumed tokens: 58562969600 | elapsed time per iteration (s): 0.38 | learning rate: 2.042E-05 | global batch size: 256 | lm loss: 2.950733E+00 | grad norm: 0.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 678.499 | TFLOPs: 31.67 | 7: iteration 111800/ 115203 | consumed samples: 28620800 | consumed tokens: 58615398400 | elapsed time per iteration (s): 0.38 | learning rate: 2.040E-05 | global batch size: 256 | lm loss: 2.952047E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 677.299 | TFLOPs: 31.61 | 7: iteration 111900/ 115203 | consumed samples: 28646400 | consumed tokens: 58667827200 | elapsed time per iteration (s): 0.38 | learning rate: 2.037E-05 | global batch size: 256 | lm loss: 2.954519E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 673.875 | TFLOPs: 31.45 | 0: [2023-03-17 06:41:33,797] [INFO] [logging.py:68:log_dist] [Rank 0] step=112000, skipped=0, lr=[2.0350245708025642e-05, 2.0350245708025642e-05, 2.0350245708025642e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 112000/ 115203 | consumed samples: 28672000 | consumed tokens: 58720256000 | elapsed time per iteration (s): 0.38 | learning rate: 2.035E-05 | global batch size: 256 | lm loss: 2.953916E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 672.587 | TFLOPs: 31.39 | 0: steps: 112000 loss: 2.9514 iter time (s): 0.375 samples/sec: 683.348 7: iteration 112100/ 115203 | consumed samples: 28697600 | consumed tokens: 58772684800 | elapsed time per iteration (s): 0.38 | learning rate: 2.033E-05 | global batch size: 256 | lm loss: 2.952850E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 674.897 | TFLOPs: 31.50 | 7: iteration 112200/ 115203 | consumed samples: 28723200 | consumed tokens: 58825113600 | elapsed time per iteration (s): 0.38 | learning rate: 2.031E-05 | global batch size: 256 | lm loss: 2.951086E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 679.213 | TFLOPs: 31.70 | 7: iteration 112300/ 115203 | consumed samples: 28748800 | consumed tokens: 58877542400 | elapsed time per iteration (s): 0.38 | learning rate: 2.029E-05 | global batch size: 256 | lm loss: 2.954075E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 678.495 | TFLOPs: 31.67 | 7: iteration 112400/ 115203 | consumed samples: 28774400 | consumed tokens: 58929971200 | elapsed time per iteration (s): 0.38 | learning rate: 2.027E-05 | global batch size: 256 | lm loss: 2.955087E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 678.475 | TFLOPs: 31.67 | 7: iteration 112500/ 115203 | consumed samples: 28800000 | consumed tokens: 58982400000 | elapsed time per iteration (s): 0.38 | learning rate: 2.025E-05 | global batch size: 256 | lm loss: 2.951289E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 678.073 | TFLOPs: 31.65 | 7: iteration 112600/ 115203 | consumed samples: 28825600 | consumed tokens: 59034828800 | elapsed time per iteration (s): 0.38 | learning rate: 2.023E-05 | global batch size: 256 | lm loss: 2.946900E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 675.757 | TFLOPs: 31.54 | 7: iteration 112700/ 115203 | consumed samples: 28851200 | consumed tokens: 59087257600 | elapsed time per iteration (s): 0.38 | learning rate: 2.021E-05 | global batch size: 256 | lm loss: 2.946242E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 675.170 | TFLOPs: 31.51 | 7: iteration 112800/ 115203 | consumed samples: 28876800 | consumed tokens: 59139686400 | elapsed time per iteration (s): 0.38 | learning rate: 2.020E-05 | global batch size: 256 | lm loss: 2.947741E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 676.818 | TFLOPs: 31.59 | 7: iteration 112900/ 115203 | consumed samples: 28902400 | consumed tokens: 59192115200 | elapsed time per iteration (s): 0.38 | learning rate: 2.018E-05 | global batch size: 256 | lm loss: 2.952080E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 678.653 | TFLOPs: 31.68 | 7: iteration 113000/ 115203 | consumed samples: 28928000 | consumed tokens: 59244544000 | elapsed time per iteration (s): 0.38 | learning rate: 2.017E-05 | global batch size: 256 | lm loss: 2.952752E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 675.966 | TFLOPs: 31.55 | 7: iteration 113100/ 115203 | consumed samples: 28953600 | consumed tokens: 59296972800 | elapsed time per iteration (s): 0.38 | learning rate: 2.015E-05 | global batch size: 256 | lm loss: 2.946702E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 677.703 | TFLOPs: 31.63 | 7: iteration 113200/ 115203 | consumed samples: 28979200 | consumed tokens: 59349401600 | elapsed time per iteration (s): 0.38 | learning rate: 2.014E-05 | global batch size: 256 | lm loss: 2.953190E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 675.798 | TFLOPs: 31.54 | 7: iteration 113300/ 115203 | consumed samples: 29004800 | consumed tokens: 59401830400 | elapsed time per iteration (s): 0.38 | learning rate: 2.012E-05 | global batch size: 256 | lm loss: 2.952957E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 668.746 | TFLOPs: 31.21 | 7: iteration 113400/ 115203 | consumed samples: 29030400 | consumed tokens: 59454259200 | elapsed time per iteration (s): 0.38 | learning rate: 2.011E-05 | global batch size: 256 | lm loss: 2.952570E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 672.633 | TFLOPs: 31.40 | 7: iteration 113500/ 115203 | consumed samples: 29056000 | consumed tokens: 59506688000 | elapsed time per iteration (s): 0.38 | learning rate: 2.010E-05 | global batch size: 256 | lm loss: 2.952206E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 674.942 | TFLOPs: 31.50 | 7: iteration 113600/ 115203 | consumed samples: 29081600 | consumed tokens: 59559116800 | elapsed time per iteration (s): 0.38 | learning rate: 2.009E-05 | global batch size: 256 | lm loss: 2.951216E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 680.177 | TFLOPs: 31.75 | 7: iteration 113700/ 115203 | consumed samples: 29107200 | consumed tokens: 59611545600 | elapsed time per iteration (s): 0.38 | learning rate: 2.008E-05 | global batch size: 256 | lm loss: 2.951164E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 674.589 | TFLOPs: 31.49 | 7: iteration 113800/ 115203 | consumed samples: 29132800 | consumed tokens: 59663974400 | elapsed time per iteration (s): 0.38 | learning rate: 2.007E-05 | global batch size: 256 | lm loss: 2.955352E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 675.790 | TFLOPs: 31.54 | 7: iteration 113900/ 115203 | consumed samples: 29158400 | consumed tokens: 59716403200 | elapsed time per iteration (s): 0.38 | learning rate: 2.006E-05 | global batch size: 256 | lm loss: 2.948821E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 676.951 | TFLOPs: 31.60 | 0: [2023-03-17 06:54:10,657] [INFO] [logging.py:68:log_dist] [Rank 0] step=114000, skipped=0, lr=[2.004947884324412e-05, 2.004947884324412e-05, 2.004947884324412e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 114000/ 115203 | consumed samples: 29184000 | consumed tokens: 59768832000 | elapsed time per iteration (s): 0.38 | learning rate: 2.005E-05 | global batch size: 256 | lm loss: 2.943916E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 680.969 | TFLOPs: 31.79 | 0: steps: 114000 loss: 2.9605 iter time (s): 0.376 samples/sec: 680.078 7: iteration 114100/ 115203 | consumed samples: 29209600 | consumed tokens: 59821260800 | elapsed time per iteration (s): 0.38 | learning rate: 2.004E-05 | global batch size: 256 | lm loss: 2.951704E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 679.813 | TFLOPs: 31.73 | 7: iteration 114200/ 115203 | consumed samples: 29235200 | consumed tokens: 59873689600 | elapsed time per iteration (s): 0.38 | learning rate: 2.003E-05 | global batch size: 256 | lm loss: 2.951578E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 677.934 | TFLOPs: 31.64 | 7: iteration 114300/ 115203 | consumed samples: 29260800 | consumed tokens: 59926118400 | elapsed time per iteration (s): 0.38 | learning rate: 2.003E-05 | global batch size: 256 | lm loss: 2.950224E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 674.863 | TFLOPs: 31.50 | 7: iteration 114400/ 115203 | consumed samples: 29286400 | consumed tokens: 59978547200 | elapsed time per iteration (s): 0.38 | learning rate: 2.002E-05 | global batch size: 256 | lm loss: 2.952292E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 676.790 | TFLOPs: 31.59 | 7: iteration 114500/ 115203 | consumed samples: 29312000 | consumed tokens: 60030976000 | elapsed time per iteration (s): 0.38 | learning rate: 2.002E-05 | global batch size: 256 | lm loss: 2.951068E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 680.472 | TFLOPs: 31.76 | 7: iteration 114600/ 115203 | consumed samples: 29337600 | consumed tokens: 60083404800 | elapsed time per iteration (s): 0.38 | learning rate: 2.001E-05 | global batch size: 256 | lm loss: 2.951908E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 677.315 | TFLOPs: 31.61 | 7: iteration 114700/ 115203 | consumed samples: 29363200 | consumed tokens: 60135833600 | elapsed time per iteration (s): 0.38 | learning rate: 2.001E-05 | global batch size: 256 | lm loss: 2.952371E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 679.798 | TFLOPs: 31.73 | 7: iteration 114800/ 115203 | consumed samples: 29388800 | consumed tokens: 60188262400 | elapsed time per iteration (s): 0.37 | learning rate: 2.001E-05 | global batch size: 256 | lm loss: 2.950070E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.751 | TFLOPs: 31.87 | 7: iteration 114900/ 115203 | consumed samples: 29414400 | consumed tokens: 60240691200 | elapsed time per iteration (s): 0.38 | learning rate: 2.000E-05 | global batch size: 256 | lm loss: 2.954697E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.383 | TFLOPs: 31.85 | 7: iteration 115000/ 115203 | consumed samples: 29440000 | consumed tokens: 60293120000 | elapsed time per iteration (s): 0.37 | learning rate: 2.000E-05 | global batch size: 256 | lm loss: 2.952015E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.987 | TFLOPs: 31.88 | 7: iteration 115100/ 115203 | consumed samples: 29465600 | consumed tokens: 60345548800 | elapsed time per iteration (s): 0.38 | learning rate: 2.000E-05 | global batch size: 256 | lm loss: 2.950575E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 680.037 | TFLOPs: 31.74 | 7: iteration 115200/ 115203 | consumed samples: 29491200 | consumed tokens: 60397977600 | elapsed time per iteration (s): 0.38 | learning rate: 2.000E-05 | global batch size: 256 | lm loss: 2.949062E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 682.445 | TFLOPs: 31.85 | 0: [after training is done] datetime: 2023-03-17 07:01:43 0: saving checkpoint at iteration 115203 to checkpoints_146m60b100m 7: ----------------------------------------------------------------------------------------------------------------- 7: validation loss at the end of training for val data | lm loss value: 3.784106E+00 | lm loss PPL: 4.399630E+01 | 7: ----------------------------------------------------------------------------------------------------------------- 0: [2023-03-17 07:01:43,793] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step115203 is begin to save! 0: [2023-03-17 07:01:43,799] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt... 0: [2023-03-17 07:01:43,904] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step115203/layer_01-model_00-model_states.pt. 0: [2023-03-17 07:01:43,904] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt... 0: [2023-03-17 07:01:43,921] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step115203/layer_03-model_00-model_states.pt. 0: [2023-03-17 07:01:43,921] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt... 0: [2023-03-17 07:01:43,936] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step115203/layer_04-model_00-model_states.pt. 0: [2023-03-17 07:01:43,936] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt... 0: [2023-03-17 07:01:43,951] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step115203/layer_05-model_00-model_states.pt. 0: [2023-03-17 07:01:43,951] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt... 0: [2023-03-17 07:01:43,966] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step115203/layer_06-model_00-model_states.pt. 0: [2023-03-17 07:01:43,967] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt... 0: [2023-03-17 07:01:43,981] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step115203/layer_07-model_00-model_states.pt. 0: [2023-03-17 07:01:43,982] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt... 0: [2023-03-17 07:01:43,997] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step115203/layer_08-model_00-model_states.pt. 0: [2023-03-17 07:01:43,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt... 0: [2023-03-17 07:01:44,012] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step115203/layer_09-model_00-model_states.pt. 0: [2023-03-17 07:01:44,012] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt... 0: [2023-03-17 07:01:44,027] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step115203/layer_10-model_00-model_states.pt. 0: [2023-03-17 07:01:44,027] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt... 0: [2023-03-17 07:01:44,042] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step115203/layer_11-model_00-model_states.pt. 0: [2023-03-17 07:01:44,042] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt... 0: [2023-03-17 07:01:44,057] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step115203/layer_12-model_00-model_states.pt. 0: [2023-03-17 07:01:44,057] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt... 0: [2023-03-17 07:01:44,072] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step115203/layer_13-model_00-model_states.pt. 0: [2023-03-17 07:01:44,072] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt... 0: [2023-03-17 07:01:44,087] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step115203/layer_14-model_00-model_states.pt. 0: [2023-03-17 07:01:44,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt... 0: [2023-03-17 07:01:44,102] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step115203/layer_15-model_00-model_states.pt. 0: [2023-03-17 07:01:44,102] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt... 0: [2023-03-17 07:01:44,117] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step115203/layer_16-model_00-model_states.pt. 0: [2023-03-17 07:01:44,117] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt... 0: [2023-03-17 07:01:44,132] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step115203/layer_17-model_00-model_states.pt. 0: [2023-03-17 07:01:44,132] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt... 0: [2023-03-17 07:01:44,133] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step115203/layer_19-model_00-model_states.pt. 0: [2023-03-17 07:01:44,134] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt 0: [2023-03-17 07:01:44,134] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt... 0: [2023-03-17 07:01:44,138] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step115203/mp_rank_00_model_states.pt. 0: [2023-03-17 07:01:44,155] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 07:01:44,155] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 07:01:44,155] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 07:01:44,155] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2023-03-17 07:01:44,155] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 07:01:44,155] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 07:01:44,155] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 6: [2023-03-17 07:01:44,155] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 07:01:44,155] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 07:01:44,155] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 07:01:44,155] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 07:01:44,155] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 3: [2023-03-17 07:01:44,155] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 07:01:44,155] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 07:01:44,155] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 07:01:44,155] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 07:01:44,155] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 4: [2023-03-17 07:01:44,155] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2023-03-17 07:01:44,155] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 07:01:44,155] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 07:01:44,155] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 07:01:44,155] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 5: [2023-03-17 07:01:44,155] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-03-17 07:01:44,155] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 07:01:44,155] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 07:01:44,155] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2023-03-17 07:01:44,155] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 7: [2023-03-17 07:01:44,155] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 07:01:44,155] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 07:01:44,155] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 07:01:44,155] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 07:01:44,155] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 1: [2023-03-17 07:01:44,155] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 07:01:44,155] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 07:01:44,155] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 07:01:44,155] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 07:01:44,155] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 0: [2023-03-17 07:01:44,155] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 2: [2023-03-17 07:01:44,155] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 07:01:44,155] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 07:01:44,155] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 07:01:44,155] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-17 07:01:44,155] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 6: [2023-03-17 07:01:44,155] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 3: [2023-03-17 07:01:44,155] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 4: [2023-03-17 07:01:44,155] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2023-03-17 07:01:44,155] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 07:01:44,155] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 5: [2023-03-17 07:01:44,155] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2023-03-17 07:01:44,155] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 07:01:44,155] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 7: [2023-03-17 07:01:44,155] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-17 07:01:44,155] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 07:01:44,155] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 1: [2023-03-17 07:01:44,155] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 07:01:44,155] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2023-03-17 07:01:44,155] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 2: [2023-03-17 07:01:44,155] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2023-03-17 07:01:44,155] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 6: [2023-03-17 07:01:44,155] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 07:01:44,155] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 3: [2023-03-17 07:01:44,155] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2023-03-17 07:01:44,155] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 2: [2023-03-17 07:01:44,155] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 07:01:44,191] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 07:01:44,191] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 07:01:44,191] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 0: [2023-03-17 07:01:44,191] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 2: [2023-03-17 07:01:44,191] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-03-17 07:01:44,192] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-03-17 07:01:44,192] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 2: [2023-03-17 07:01:44,192] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2023-03-17 07:01:44,192] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2023-03-17 07:01:44,192] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 2: [2023-03-17 07:01:44,193] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-03-17 07:01:44,193] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-03-17 07:01:44,193] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 2: [2023-03-17 07:01:44,193] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2023-03-17 07:01:44,194] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-03-17 07:01:44,194] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 2: [2023-03-17 07:01:44,195] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2023-03-17 07:01:44,195] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-03-17 07:01:44,195] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 0: [2023-03-17 07:01:44,197] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-17 07:01:44,198] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-03-17 07:01:44,198] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 0: [2023-03-17 07:01:44,198] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2023-03-17 07:01:44,198] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-17 07:01:44,198] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 0: [2023-03-17 07:01:44,199] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-03-17 07:01:44,199] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-17 07:01:44,199] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 0: [2023-03-17 07:01:44,199] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-03-17 07:01:44,199] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-03-17 07:01:44,199] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 2: [2023-03-17 07:01:44,203] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2023-03-17 07:01:44,203] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-17 07:01:44,203] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2023-03-17 07:01:44,203] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 07:01:44,203] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 2: [2023-03-17 07:01:44,203] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 0: [2023-03-17 07:01:44,203] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-03-17 07:01:44,204] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-17 07:01:44,204] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 0: [2023-03-17 07:01:44,205] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-03-17 07:01:44,205] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2023-03-17 07:01:44,205] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-03-17 07:01:44,205] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-03-17 07:01:44,205] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 0: [2023-03-17 07:01:44,205] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 5: [2023-03-17 07:01:44,218] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2023-03-17 07:01:44,218] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2023-03-17 07:01:44,218] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-03-17 07:01:44,218] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-03-17 07:01:44,218] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2023-03-17 07:01:44,218] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2023-03-17 07:01:44,218] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-03-17 07:01:44,218] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-03-17 07:01:44,218] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-03-17 07:01:44,218] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-03-17 07:01:44,218] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-03-17 07:01:44,218] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-03-17 07:01:44,218] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-17 07:01:44,218] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2023-03-17 07:01:44,218] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-03-17 07:01:44,218] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-03-17 07:01:44,218] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 5: [2023-03-17 07:01:44,218] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 5: [2023-03-17 07:01:44,218] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 5: [2023-03-17 07:01:44,218] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 5: [2023-03-17 07:01:44,218] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 5: [2023-03-17 07:01:44,218] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 5: [2023-03-17 07:01:44,218] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 5: [2023-03-17 07:01:44,218] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 6: [2023-03-17 07:01:44,220] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-03-17 07:01:44,220] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-03-17 07:01:44,220] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-03-17 07:01:44,220] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2023-03-17 07:01:44,220] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2023-03-17 07:01:44,220] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2023-03-17 07:01:44,220] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2023-03-17 07:01:44,221] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-03-17 07:01:44,221] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 07:01:44,221] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 07:01:44,221] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2023-03-17 07:01:44,221] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-17 07:01:44,221] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2023-03-17 07:01:44,221] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2023-03-17 07:01:44,221] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 6: [2023-03-17 07:01:44,221] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 6: [2023-03-17 07:01:44,221] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 6: [2023-03-17 07:01:44,221] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 6: [2023-03-17 07:01:44,221] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 6: [2023-03-17 07:01:44,221] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 6: [2023-03-17 07:01:44,221] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 1: [2023-03-17 07:01:44,222] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2023-03-17 07:01:44,222] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2023-03-17 07:01:44,222] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-17 07:01:44,222] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-03-17 07:01:44,222] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-03-17 07:01:44,222] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-03-17 07:01:44,222] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-03-17 07:01:44,222] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-03-17 07:01:44,222] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2023-03-17 07:01:44,222] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2023-03-17 07:01:44,222] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-03-17 07:01:44,222] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 1: [2023-03-17 07:01:44,222] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-03-17 07:01:44,222] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-17 07:01:44,222] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 07:01:44,222] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-03-17 07:01:44,222] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 07:01:44,222] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 1: [2023-03-17 07:01:44,222] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 1: [2023-03-17 07:01:44,222] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 1: [2023-03-17 07:01:44,222] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 1: [2023-03-17 07:01:44,222] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 1: [2023-03-17 07:01:44,222] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 1: [2023-03-17 07:01:44,222] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 6: [2023-03-17 07:01:44,223] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-03-17 07:01:44,223] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-17 07:01:44,223] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 7: [2023-03-17 07:01:44,224] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-03-17 07:01:44,224] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-03-17 07:01:44,224] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2023-03-17 07:01:44,224] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2023-03-17 07:01:44,224] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-03-17 07:01:44,224] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-17 07:01:44,224] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-03-17 07:01:44,224] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2023-03-17 07:01:44,224] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2023-03-17 07:01:44,224] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2023-03-17 07:01:44,224] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 07:01:44,224] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-03-17 07:01:44,224] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 7: [2023-03-17 07:01:44,224] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 7: [2023-03-17 07:01:44,224] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-03-17 07:01:44,224] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 07:01:44,224] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 7: [2023-03-17 07:01:44,224] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 7: [2023-03-17 07:01:44,224] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-03-17 07:01:44,224] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-03-17 07:01:44,224] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 7: [2023-03-17 07:01:44,224] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 7: [2023-03-17 07:01:44,224] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 7: [2023-03-17 07:01:44,224] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 4: [2023-03-17 07:01:44,225] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-03-17 07:01:44,225] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-03-17 07:01:44,225] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2023-03-17 07:01:44,225] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-03-17 07:01:44,225] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-03-17 07:01:44,225] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2023-03-17 07:01:44,225] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2023-03-17 07:01:44,225] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-03-17 07:01:44,225] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 07:01:44,225] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2023-03-17 07:01:44,225] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 07:01:44,225] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-03-17 07:01:44,225] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 07:01:44,225] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2023-03-17 07:01:44,225] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-03-17 07:01:44,225] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 07:01:44,225] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 4: [2023-03-17 07:01:44,225] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 4: [2023-03-17 07:01:44,225] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 4: [2023-03-17 07:01:44,225] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 4: [2023-03-17 07:01:44,225] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 4: [2023-03-17 07:01:44,225] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 4: [2023-03-17 07:01:44,225] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 4: [2023-03-17 07:01:44,225] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 3: [2023-03-17 07:01:44,227] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2023-03-17 07:01:44,227] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-03-17 07:01:44,227] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-17 07:01:44,227] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-03-17 07:01:44,227] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-03-17 07:01:44,227] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2023-03-17 07:01:44,227] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2023-03-17 07:01:44,227] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2023-03-17 07:01:44,227] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-03-17 07:01:44,227] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-17 07:01:44,227] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-17 07:01:44,227] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 07:01:44,227] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-17 07:01:44,227] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2023-03-17 07:01:44,227] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2023-03-17 07:01:44,227] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-03-17 07:01:44,227] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 3: [2023-03-17 07:01:44,227] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 3: [2023-03-17 07:01:44,227] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 3: [2023-03-17 07:01:44,227] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 3: [2023-03-17 07:01:44,227] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 3: [2023-03-17 07:01:44,227] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 3: [2023-03-17 07:01:44,227] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 3: [2023-03-17 07:01:44,227] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 0: [2023-03-17 07:01:44,237] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_146m60b100m/global_step115203/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 07:01:44,237] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 0: successfully saved checkpoint at iteration 115203 to checkpoints_146m60b100m END 3324364: Fri 17 Mar 2023 07:01:57 AM EET